I’m a huge fan of MOOCs (Massive Open Online Courses). I am always on the hunt for something new to learn to increase my knowledge and productivity; and because I run a blog, MOOCs provide fodder for me to share what I learn.
I recently took the Data Visualization class offered by the University of Illinois at Urbana-Champaign on Coursera. The class is offered as part of the Data Mining specialty of six courses that when taken together can lead to graduate credit in its online Master of Computer Science Degree in Data Science.
Ok enough with the brochure items. For the first assignment I constructed a visualization based upon temperature information from NASA’s Goddard Institute for Space Studies (GISS).
In order to understand the data, you have to understand why temperature anomalies are used as opposed to raw absolute temperature measurements. It is important to note that the temperatures shown in my visualization are not absolute temperatures but rather temperature anomalies.
Here’s an explanation from NOAA:
“In climate change studies, temperature anomalies are more important than absolute temperature. A temperature anomaly is the difference from an average, or baseline, temperature. The baseline temperature is typically computed by averaging 30 or more years of temperature data. A positive anomaly indicates the observed temperature was warmer than the baseline, while a negative anomaly indicates the observed temperature was cooler than the baseline.”
Interpreting the Visualization
The course leaves it up to the learner to decide which visualization tool to use in order to display the temperature change information. Although I have experience with multiple visualization programs like Qlikview and Power BI, Tableau is my tool of choice. I didn’t just create a static visualization, I created an interactive dashboard that you can reference by clicking below.
From a data perspective, I believe the numbers in the file that the course provides is a bit different than the one I am linked to here but you can see the format of the data that needs to be pivoted in order to make an appropriate line graph.
All of the data in this set illustrates that temperature anomalies are increasing from the corresponding 1951-1980 mean temperatures as years progress. Every line graph of readings from meteorological stations shows an upward trend in temperature deviation readings. The distribution bins illustrate that the higher temperature deviations occur in more recent years. The recency of years is indicated by the intensity of the color red.
Let’s break down the visualization:
Top Section Distribution Charts:
- There are three sub-sections representing global, northern hemisphere and southern hemisphere temperature deviations
- The x axis represents temperature deviations in bins of 10 degrees
- The y axis is a count of the number of years that fall between the binned temperature ranges
- For example, if 10 years have a recorded temperature anomaly between 60 and 69 degrees, then the x axis would be 60 and the y axis would be 10
- Each 10 degree bin is comprised of the various years that correspond to a respective temperature anomaly range
- For example in the picture above, the year 1880 (as designated by the tooltip) had a temperature anomaly that was 19 degrees lower than the 30 year average. This is why the corresponding box for the year 1880 is not intensely colored.
- Additionally, the -19 degree anomaly is located in the -10 degree bin (which contains anomalies from -10 to -19 degrees)
- These aspects are more clearly illustrated when interacting with the Tableau Public dashboard
- The intensity of the color of red indicates the recency of the year; for example year 1880 would be represented as white while year 2014 would be indicated by a deep red color
Bottom Section Line Graph Chart:
- The y axis represents the temperature deviation from the corresponding 1951-1980 mean temperatures
- Each line represents the temperature deviation at a specific geographic location during the 1880-2014 period
- The x axis represents the year of the temperature reading
In the above picture I strip out the majority of lines leaving only the global deviation line. Climate science deniers may want to look away as the data clearly shows that global temperatures are rising.
All in all I thought it was a decent class covering very theoretical issues regarding data visualization. Practicality is exclusively covered in the exercises as the class does not provide any instruction on how to use any of the tools required to complete the class. I understand the reason as this is not a “How to Use a Software Tool” class.
I’d define the exercises as “BYOE” (i.e., bring your own expertise). The class forces you to do your own research in regards to visualization tool instruction. This is especially true regarding the second exercise which requires you to learn how to visualize graphs and nodes. I had to learn how to use a program called Gephi in order to produce a network map of the cities in my favorite board game named Pandemic. The lines between the city nodes are the paths that one can travel within the game.
If you’re looking for more practicality and data visualization best practices as opposed to hardcore computer science topics take a look at the Coursera specialization from UC Davis called “Visualization with Tableau”.
In case you were wondering I received at 96% grade in the UIUC course.
My final rating for the class is 3 stars out 5; worth a look.