TidyTuesday Ramen analysis

Sep 2

Reviewing data

The Data set is called The Ramen Rater, and it covers ramens ratings. From many different countries. Now, let us take a top-level look at the data set. We have over 3000 observation of across six variables. The Stars variable is going the be the rating system. There is also a variable called style which describes the kind of container the ramen comes in. .

glimpse(ramen_ratings)

## Observations: 3,180
## Variables: 6
## $ review_number <dbl> 3180, 3179, 3178, 3177, 3176, 3175, 3174, 3173, ...
## $ brand         <chr> "Yum Yum", "Nagatanien", "Acecook", "Maison de C...
## $ variety       <chr> "Tem Tem Tom Yum Moo Deng", "tom Yum Kung Rice V...
## $ style         <chr> "Cup", "Pack", "Cup", "Cup", "Tray", "Cup", "Pac...
## $ country       <chr> "Thailand", "Japan", "Japan", "France", "Japan",...
## $ stars         <dbl> 3.75, 2.00, 2.50, 3.75, 5.00, 3.50, 3.75, 5.00, ...

unique(ramen_ratings$style)

## [1] "Cup"        "Pack"       "Tray"       "Bowl"       "Box"       
## [6] "Restaurant" "Can"        "Bar"        NA

We can see the rating system is going to be from 0 to 5 stars. There are also only 14 NA's in the data set. Given how large the data set is, I will be removing them from any more analysis.

summary(ramen_ratings$stars)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.250   3.750   3.688   4.500   5.000      14

The Question to ask:

I want to know what the country will have the best instant ramen that I can buy if I am at the store. Lets first take a look at the average rating by country along with taking a count of how many scores are submitted for those countries. Seeing that there are countries that like Austalia that have a 5-star rating but only have four reviews submitted, I will be looking at the countries with the most interactions.

ramen_ratings %>% na.omit() %>% group_by(country,stars) %>% 
  summarise('average_rating'= mean(stars),'country_count'= length(country)) %>% 
  arrange(desc(average_rating)) %>% head(15)

## # A tibble: 15 x 4
## # Groups:   country [15]
##    country   stars average_rating country_count
##    <chr>     <dbl>          <dbl>         <int>
##  1 Australia     5              5             4
##  2 Brazil        5              5             1
##  3 Cambodia      5              5             2
##  4 Canada        5              5             2
##  5 China         5              5            18
##  6 France        5              5             1
##  7 Germany       5              5             1
##  8 Hong Kong     5              5            27
##  9 India         5              5             2
## 10 Indonesia     5              5            35
## 11 Japan         5              5           105
## 12 Malaysia      5              5            66
## 13 Mexico        5              5             2
## 14 Myanmar       5              5             4
## 15 Nepal         5              5             1

Looking at the data, we do see that after the country of Vietnam, there is a dip in the number of reviews submitted. With that said I would be using the first eleven countries.

ramen_country_count <-   ramen_ratings %>% 
  group_by(country) %>%
  summarise('country_count' = length(country)) %>% 
  arrange(desc(country_count))

  head(ramen_country_count,15)

## # A tibble: 15 x 2
##    country       country_count
##    <chr>                 <int>
##  1 Japan                   532
##  2 United States           382
##  3 South Korea             357
##  4 Taiwan                  330
##  5 China                   207
##  6 Thailand                205
##  7 Malaysia                182
##  8 Hong Kong               155
##  9 Indonesia               150
## 10 Singapore               134
## 11 Vietnam                 112
## 12 UK                       69
## 13 Philippines              49
## 14 Canada                   48
## 15 India                    41

Let's see what a histogram of the data looks like, stars on the x=axis, in a grid of countries and style. Wow, a lot of the reviews seem to be forced around bowl, cup, and pack of ramen. Let's go ahead and filter out all of the other categories.

## Warning: Removed 14 rows containing non-finite values (stat_bin).

plot of chunk pressure

## Warning: Removed 14 rows containing non-finite values (stat_bin).

plot of chunk unnamed-chunk-4

It looks like we have a good set of data points to see what country will have the best ramen, and we will even understand what the best style of ramen is for each country.

Final Thoughts

We can see the country Malaysia has both the highest score for both Bowl and pack ramen. With Singapore coming in a close second. Next time you find your self looking for an instant ramen snack, I would recommend grabbing something that comes from Malaysia or Singapore.

ramen_rating_top_11_BPC %>% na.omit() %>% 
                      group_by(country, style) %>% 
                      summarise('average rating' = mean(stars))  %>%
                      arrange(desc(`average rating`))

## # A tibble: 32 x 3
## # Groups:   country [11]
##    country     style `average rating`
##    <chr>       <chr>            <dbl>
##  1 Malaysia    Bowl              4.33
##  2 Malaysia    Pack              4.18
##  3 Singapore   Bowl              4.16
##  4 Singapore   Pack              4.16
##  5 Indonesia   Pack              4.12
##  6 Japan       Bowl              4.01
##  7 Hong Kong   Cup               3.99
##  8 Indonesia   Cup               3.94
##  9 South Korea Pack              3.90
## 10 Singapore   Cup               3.90
## # ... with 22 more rows

Patrick Ayers

TidyTuesday Ramen analysis

Reviewing data

The Question to ask:

Final Thoughts

Tidytuesday Philly Parking Tickets