TidyTuesday Ramen analysis
Reviewing data
The Data set is called The Ramen Rater, and it covers ramens ratings. From many different countries. Now, let us take a top-level look at the data set. We have over 3000 observation of across six variables. The Stars variable is going the be the rating system. There is also a variable called style which describes the kind of container the ramen comes in. .
glimpse(ramen_ratings)
## Observations: 3,180
## Variables: 6
## $ review_number <dbl> 3180, 3179, 3178, 3177, 3176, 3175, 3174, 3173, ...
## $ brand <chr> "Yum Yum", "Nagatanien", "Acecook", "Maison de C...
## $ variety <chr> "Tem Tem Tom Yum Moo Deng", "tom Yum Kung Rice V...
## $ style <chr> "Cup", "Pack", "Cup", "Cup", "Tray", "Cup", "Pac...
## $ country <chr> "Thailand", "Japan", "Japan", "France", "Japan",...
## $ stars <dbl> 3.75, 2.00, 2.50, 3.75, 5.00, 3.50, 3.75, 5.00, ...
unique(ramen_ratings$style)
## [1] "Cup" "Pack" "Tray" "Bowl" "Box"
## [6] "Restaurant" "Can" "Bar" NA
We can see the rating system is going to be from 0 to 5 stars. There are also only 14 NA's in the data set. Given how large the data set is, I will be removing them from any more analysis.
summary(ramen_ratings$stars)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.250 3.750 3.688 4.500 5.000 14
The Question to ask:
I want to know what the country will have the best instant ramen that I can buy if I am at the store. Lets first take a look at the average rating by country along with taking a count of how many scores are submitted for those countries. Seeing that there are countries that like Austalia that have a 5-star rating but only have four reviews submitted, I will be looking at the countries with the most interactions.
ramen_ratings %>% na.omit() %>% group_by(country,stars) %>%
summarise('average_rating'= mean(stars),'country_count'= length(country)) %>%
arrange(desc(average_rating)) %>% head(15)
## # A tibble: 15 x 4
## # Groups: country [15]
## country stars average_rating country_count
## <chr> <dbl> <dbl> <int>
## 1 Australia 5 5 4
## 2 Brazil 5 5 1
## 3 Cambodia 5 5 2
## 4 Canada 5 5 2
## 5 China 5 5 18
## 6 France 5 5 1
## 7 Germany 5 5 1
## 8 Hong Kong 5 5 27
## 9 India 5 5 2
## 10 Indonesia 5 5 35
## 11 Japan 5 5 105
## 12 Malaysia 5 5 66
## 13 Mexico 5 5 2
## 14 Myanmar 5 5 4
## 15 Nepal 5 5 1
Looking at the data, we do see that after the country of Vietnam, there is a dip in the number of reviews submitted. With that said I would be using the first eleven countries.
ramen_country_count <- ramen_ratings %>%
group_by(country) %>%
summarise('country_count' = length(country)) %>%
arrange(desc(country_count))
head(ramen_country_count,15)
## # A tibble: 15 x 2
## country country_count
## <chr> <int>
## 1 Japan 532
## 2 United States 382
## 3 South Korea 357
## 4 Taiwan 330
## 5 China 207
## 6 Thailand 205
## 7 Malaysia 182
## 8 Hong Kong 155
## 9 Indonesia 150
## 10 Singapore 134
## 11 Vietnam 112
## 12 UK 69
## 13 Philippines 49
## 14 Canada 48
## 15 India 41
Let's see what a histogram of the data looks like, stars on the x=axis, in a grid of countries and style. Wow, a lot of the reviews seem to be forced around bowl, cup, and pack of ramen. Let's go ahead and filter out all of the other categories.
## Warning: Removed 14 rows containing non-finite values (stat_bin).
## Warning: Removed 14 rows containing non-finite values (stat_bin).
It looks like we have a good set of data points to see what country will have the best ramen, and we will even understand what the best style of ramen is for each country.
Final Thoughts
We can see the country Malaysia has both the highest score for both Bowl and pack ramen. With Singapore coming in a close second. Next time you find your self looking for an instant ramen snack, I would recommend grabbing something that comes from Malaysia or Singapore.
ramen_rating_top_11_BPC %>% na.omit() %>%
group_by(country, style) %>%
summarise('average rating' = mean(stars)) %>%
arrange(desc(`average rating`))
## # A tibble: 32 x 3
## # Groups: country [11]
## country style `average rating`
## <chr> <chr> <dbl>
## 1 Malaysia Bowl 4.33
## 2 Malaysia Pack 4.18
## 3 Singapore Bowl 4.16
## 4 Singapore Pack 4.16
## 5 Indonesia Pack 4.12
## 6 Japan Bowl 4.01
## 7 Hong Kong Cup 3.99
## 8 Indonesia Cup 3.94
## 9 South Korea Pack 3.90
## 10 Singapore Cup 3.90
## # ... with 22 more rows