Olympics 2022
by Amy Townend, Sam Simmons, Dylan Williams
Introduction:
The Olympics have made significant social and political impacts throughout our lives of watching the Olympics take place and history. It has unified us and given individuals opportunities like including women in sports, allowing people to take a stand on racial matters, and so much more. It is important to learn about the Olympics, and no better way to do it than through data visualizations.
Winter Olympics:
The 2022 Winter Olympics which was hosted in Beijing included the participation of 91 countries and 2871 athletes. This was the 24th Winter Olympics and it faced the lowest viewer rate in Olympic history of 11.4 million viewers. The Olympics have always been a time for countries to come together among political and economic conflicts and celebrate their athletes. The time in which the Olympics were canceled was due to world wars in 1916, 1940, and 1944. When a country is chosen to host the Olympics, it is a great honor. Countries will spend millions in order to ensure their athletes are ready to perform at the best. Sports have always been a test of natural skill but as technology has grown, so has the economic advantage. Richer countries have the disposable income to spend on the best coaches, equipment and even location. For the Winter Olympics, this is especially important to note as winter sports require even more funding to train for. In the famous movie Cool Runnings, a group of Jamaican track runners band together to become the first Jamaican bobsled team at the Olympics. This movie has a light hearted comedy feel to it but it reflects a true economic and climate advantage issue that is present with the Winter Olympics. The climate in Jamaica is hot and windy. Over the course of the year, the temperature typically varies from 73°F to 89°F and is rarely below 71°F or above 91°F. In comparison to Russia who receives snowfall about 6 months out of the year and face temperatures below -40°F. As anyone can see, this climate in Russia gives the an automatic advantage over warmer climates such as Jamaica. For the Winter Olympics, we see that a climate variable greatly affects athletes ability to train therefore affecting medal count.
Choosing Data:
When our group was assigned together, we all were interested in doing a project involving sports. The Winter Olympics had just occurred and we thought it would be the perfect and insightful dataset for collaboration and coding. For our project, we decided to investigate the effect of climate and economic status on medal count at the Winter 2022 Olympics. We chose the variables we did because we thought as though they would provide us insight on the medal counts. We were eager to learn from our dataset and collaborate to create insightful data visualizations. We got our data from Kaggle in which someone created a data set which was updated daily with the new medal count. In order to get climate data on the countries we chose, we were able to access another database on Kaggle which gave us the average temperatures. Finally, in order to judge how an economic status would affect a country’s ability to have good athlete participation, we downloaded a data set from the World Bank. In order to get a good feel on the number of athletes qualified vs total population, we also collected total populations from 2020 since that is the most present data we could get. In order to run our visualizations and regressions, we were able to merge these data sets in a summary data frame. This is a very preliminary study into correlations between these variables and an extensive analysis would need to be done with much more data and variables to avoid omitted variable bias. We spent a fair amount of time cleaning data in the datasets. We learned that cleaning data is super important because it increases overall productivity and allows for the highest quality of information for our project. For the country dataset, we picked the top 22 countries with the highest medal count to conduct our research on so we could have a smaller sample group. We were able to focus on these 22 countries and all of the data provided instead of looking into every country. For the temperature dataset, we were interested in only looking at the average numbers from a few years in a row, not every single temperature every year.
Methods:
In order to start our research, we decided to start with basic statistics in order to get a beginning understanding of our data set and where our analysis could begin. One of our visualizations we wanted to build was a stacked bar plot displaying the total number of medals per country and showing the percentage of the total medals coming from gold, silver and bronze. This would allow us to get beginner level understanding of the countries with the highest and lowest medal counts. As we can see from our graph, Norway and the Russian Olympic Committee have the highest medal counts of 37 and 32. The countries with the lowest medals counts are Poland, Czech Republic, and Belarus with 1, 2 and 2 medals total. It is helpful to have the stacked bar plot because it reflects the types of medals each country received rather than the total. If we were to rank the Olympics using points as a system, we could assign each medal a different point value and we would have significantly different countries at the top.
In order to test our hypothesis about the impact of temperature, we decided to use a scatter plot to see if there is any correlation between average temperature and total number of medals.
We wanted to explore the relationship between a country’s gold medal count and several variables that we have already used in our analysis. In order to get a result, we ran a linear regression model and received a R^2 value of 1. This tells us that we have completely explained the variance in our dependent variable, number of gold medals. This was surprising to us since as we have done our research we have found that there should be many variables that could affect medal count. As an experienced regression user, I was surprised when 1 was returned as a value because this is highly unlikely and I wonder if we hit a maximium variable usage on Rstudio. We could use another program such as Stata in order to check our regression. By adding variables, I expected to see our R^2 value to rise but we would need to use an adjusted R^2 value in order to prove it helped better our predictor.
For another data visualization, we graphed population versus number of athletes per country. These variables are important to analyze because there are several takeaways from comparing population and number of athletes per country. The reason most of the data on the graph is close to the y-axis is because there is a large outlier on the right side that causes a distinctive distribution of athletes per country. China has quite a large population, deviating from the average population, at 1439323776 people. Another notable takeaway from the graph is that Norway has the greatest number of olympic medals, 37.
Another data visualization we created that tests our hypothesis of Temperature affecting Medal count is a Heat Map. This heat map shows the average winter temperature across countries on the y axis and groupings of average temperature on the x axis. The deeper the color on the fill is the medal count of each country. As you can see, a lot of the countries with colder temperatures have high medal counts with a few outliers. One of these outliers we need to consider is the United States as the country is so big that there are different climates across the U.S. So, this means average winter temperatures in warmer places are considered to like California or Florida.
Having only a few countries represented in our data set allows us to take a deeper dive into the analysis of medal counts. One of our visualizations involves 3 variables. The three variables: total number of medals won, country, and number of athletes are represented in this visualization. The country of origin is represented on the x-axis and the number of athletes is represented on the y-axis. Since the total number of medals won is a numeric value, ggplot created a legend with a continuous color scale to represent the amount of medals won. A higher count of medals(closer to 30) is represented with a light shade of blue whereas a lower number of medals won is represented as a darker navy blue. A takeaway from this data visualization is that the greater the number of athletes representing a country, the more total medals that country has won.
Further Research
This data analysis can lead to future development of a country. Our data analysis examines the relationship between GDP, population, and temperature. Our results presents a first step towards examining the extent to which those variables may play a role in a country’s success in the Olympics. These variables might also be proxies for other variables, for example, low temperature may indicate a strong winter culture, high GDP might reflect the amount of funding the sport receives, etc. For future research, we could analyze more countries(possibly all countries) given more time. We could also look into other variables such as gender, height, income, education, and several others. This would allow us for a deeper analysis. Having a shorter period of time for this analysis causes some limitations on our dataset.
Presentation
Our presentation can be found here.
Data
- Sarkhel, Arjun Prasad 2022, “2022 Winter Olympics Beijing”, Kaggle , viewed 19 February 2022, https://www.kaggle.com/datasets/arjunprasadsarkhel/2022-winter-olympics-beijing.
- World Bank. “GDP (constant 2015 US$)” World Development Indicators, The World Bank Group, 2015,https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.
- Worldometer. “Countries in the world by population (2022)”, https://www.worldometers.info/world-population/population-by-country/
- Freedom House. “Global Freedom Scores” Freedom House, 2020, https://freedomhouse.org/countries/freedom-world/scores
- Chavan, Akshay 2020, “Average Temperature per country per year”, Kaggle, viewed 22 February 2022, https://www.kaggle.com/code/akshaychavan/average-temperature-per-country-per-year/data
References
- Background Image: https://olympics.com/ioc/olympic-rings
- Image 1: https://www.disneyplus.com/movies/cool-runnings/1zyvW8wIgqET
- Image 2: https://www.sportingnews.com/us/olympics/news/olympics-medal-count-2022-gold-silver-bronze/q7x3klukq471udtxhrsawd5w