In this tutorial we will learn how to use scatter plots and customize our graphs. We will also be making use of a new data set. In the last two articles, we looked at population over time. The population data set only included population as a measurable value and for scatter plots, at least two varying values are needed.
Preliminary Steps
First we’ll need to import the needed packages necessary for data analysis. Then we will obtain the pharmaceutical drug spending data set from datahub.io and save it on our machine. Make sure to take note of where the csv file is saved.
import matplotlib.pyplot as plt
import pandas as pd
Now we’ll read in the csv file. Denoting the index column as 0 (the first column) will make it easier to obtain data for specific countries.
spendData = pd.read_csv('../Downloads/pharm_data_csv.csv',
index_col = 0)
spendData.head
To get a quick look at the statistical distributions of the data, we’ll do the following:
spendData.describe()
Now we’re all set to conduct data analysis!
Australia Pharmaceutical Spending
Using .loc, we can the obtain data specifically related to Australia’s spending.
aus = spendData.loc['AUS']
aus
We'll use a scatter plot to show the correlation between % of Health Spending with Spending in US GDP per capita for Australia. These columns are denoted as PC_HEALTHCP
AND USD_CAP
respectively.
aus.plot(kind = 'scatter', x = 'USD_CAP', y = 'PC_HEALTHXP')
xlab = 'Spending in US GDP per capita'
ylab = '% of Health spending'
title = 'Australia Pharmaceutical Spending'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
This is one way of creating a scatter plot which involves using DataFrame.plot and passing in kind = 'scatter'
. We can include a little more information by making the Total Spending denoted by the size of the data points on the graph.
aus.plot(kind = 'scatter', x = 'USD_CAP', y = 'PC_HEALTHXP',
s = aus['TOTAL_SPEND']/100)
xlab = 'Spending in US GDP per capita'
ylab = '% of Health spending'
title = 'Australia Pharmaceutical Spending'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
Plotting the % of Heath spending against the year, we see a similar trend. The year is represented by TIME
in the DataFrame.
aus.plot(kind = 'scatter', x = 'TIME', y = 'PC_HEALTHXP',
s = aus['TOTAL_SPEND']/100)
xlab = 'Year'
ylab = '% of Health spending'
title = 'Australia Pharmaceutical Spending'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
Pharmaceutical Spending in 2005
Now we’ll compare the spending of all countries in 2005. To obtain all the data, we’ll do the following:
spend2005 = spendData.loc[(spendData.TIME == 2005)]
spend2005
Similar to what we did with Australia, we’ll look at the correlation between % of Health Spending with Spending in US GDP per capita for all countries in 2005. The size of each point corresponds to Total Spending. This time we'll be using DataFrame.plot.scatter. Using this syntax to create the scatter plot will allow us to add a colormap later on.
spend2005.plot.scatter(x = 'USD_CAP', y = 'PC_HEALTHXP',
s = spend2005['TOTAL_SPEND']/100)
xlab = 'Spending in US GDP per capita'
ylab = '% of Health spending'
title = 'Pharmaceutical Spending in 2005'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
The graph shows much more variation and is no longer linear. To go a step further, we can add another piece of data from the table, the percentage of GDP denoted as PC_GDP
.
spend2005.plot.scatter('USD_CAP', 'PC_HEALTHXP',
s = spend2005['TOTAL_SPEND']/100, c = 'PC_GDP',
colormap = 'viridis')
xlab = 'Spending in US GDP per capita'
ylab = '% of Health spending'
title = 'Pharmaceutical Spending in 2005'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
This graph adds a colormap to denote the percentage of GDP. We get a good view of how all these values appear compared against each other.
The expected code execution is located here for reference.
Closing
In this tutorial we learned to create scatter plots, closing out the series. We’ve conducted basic data analysis on countries around the world and looked at both the population trends and pharmaceutical drug spending. Congrats on making it to the end of the series, you've learned a lot!
Top comments (0)