Neha

Posted on Mar 4, 2022

Data Analysis in Python using Jupyter Notebook - Part 2

#python #tutorial #cnc2022 #learninpublic

In this tutorial we will expand on our knowledge and learn how to plot multiple lines on the same graph, streamline our code, and create bar graphs.

Preliminary Steps

If you have not worked through Part 1, to ensure you're able to run the code in this tutorial, make sure to obtain the population data set and run the following based on where you have saved the file:

import matplotlib.pyplot as plt
import pandas as pd
popData = pd.read_csv('../Downloads/population_csv.csv')

Explore the data

To further understand our data set and know the unique countries/regions in the data, we'll use the following line of code. This will give us a table with only one country/region per row so we can see all the places listed.

popData.drop_duplicates(subset = "Country Name")

We can see there are 263 unique parts of the world listed after running the above code.

Plotting Multiple Lines on the Same Plot

Now we'll try plotting multiple countries on the same graph to see how their populations grow relative to each other. We can filter the countries we want to plot using the 'Country Name' field.

pop = popData.loc[(popData['Country Name'] == 'Zimbabwe') | 
(popData['Country Name'] == 'Vietnam') | 
(popData['Country Name'] == 'Spain')]

I initially struggled to figure out how to put these three countries on the same plot. After some research, I learned that there is one line of code using the Seaborn package that accomplishes this. Next, we’ll add some labels to the graph to make it clear.

import seaborn as sns
sns.lineplot(data = pop, x = 'Year', y = 'Value', hue = 'Country Name')
ylab = 'Population'
title = 'Population'
plt.ylabel(ylab)
plt.title(title)

While we can see that all this code works well, there is a more streamlined way we can obtain the data for specific countries. This can be done by changing the indices of the rows of the table. With the below line, now the country code is the index.

popData.index = popData['Country Code']

As you can see, the numerical index for each row has now been replaced with the Country Code.

popData.head

Now using .loc with the Country Code, the data for a specific country or region can be obtained.

popData.loc['ZWE']

To avoid the redundancy of the country code getting listed twice, the following code removes the display of the index.

zwe = popData.loc['ZWE']
print(zwe.to_string(index = False))

Now we can continue to plot the data for Zimbabwe just as we did in Part 1 of this series.

zwe.plot('Year', 'Value')
ylab = 'Population'
title = 'Zimbabwe Population'
plt.ylabel(ylab)
plt.title(title)

Writing zwe = popData.loc['ZWE'] is much more steamlined than zwe = popData.loc[popData['Country Name'] == 'Zimbabwe']. Before the code had to check whether the field ‘Country Name’ was equal to Zimbabwe. With the current code, it only needs to check whether the index value is ‘ZWE’. Now we can use this same idea to rewrite our code for plotting Spain, Zimbabwe, and Vietman on the same graph!

pop = popData.loc[['ZWE', 'VNM', 'ESP']]
sns.lineplot(data = pop, x = 'Year', y = 'Value', hue = 'Country Name')
ylab = 'Population'
title = 'Population'
plt.ylabel(ylab)
plt.title(title)

As we can see, we’re able to get a line plot with all three countries just like we did before.

Filtering by Population Value

Now we are going to display a bar graph with data that satisfies the conditions below.

highPop = popData.loc[(popData.Value > 1000000000) & 
(popData.Year == 1985)]

For plotting a bar graph, we can use the ‘kind’ field to denote a bar graph.

highPop.plot('Country Name', 'Value', kind = "bar")
ylab = 'Population'
title = 'Population > 1B'
plt.ylabel(ylab)
plt.title(title)

We get a bar graph with all the countries/regions that satisfy our criteria.

To filter the years where Zimbabwe has a population is over 10000000, we can do the following:

zwePop = popData.loc[(popData['Country Name'] == 'Zimbabwe') &
(popData.Value > 10000000)] 
zwePop.plot('Year', 'Value', kind = "bar")
ylab = 'Population'
title = 'Zimbabwe Population'
plt.ylabel(ylab)
plt.title(title)

The expected code execution is located here for reference.

Up Next

Now you have learned a little more on plots. We'll continue building upon what we've learned here in the next installment!

Top comments (1)

Tom Danny • Mar 20 '24

Printed tote bags at a data science workshop was an informative session title: "Data Analysis in Python using Jupyter Notebook - Part 2." This headline intrigued attendees, igniting discussions about advanced techniques and methodologies for analyzing data with Python. Participants shared experiences, tips, and best practices, exploring topics such as data visualization, statistical analysis, and machine learning integration. The printed tote bags served as practical reminders of the valuable knowledge gained during the workshop.

CodeNewbie Community 🌱

Data Analysis in Python using Jupyter Notebook - Part 2

Preliminary Steps

Explore the data

Plotting Multiple Lines on the Same Plot

Filtering by Population Value

Up Next

Top comments (1)

Read next

AWS WAF or SafeLine: Flexibility vs Pre-Configured Protection

Kingdee ERP Under Attack: File Upload Flaw Exposed

Building a Dog Walking App: Lessons From the Trenches

Secure Your Website in 3 Minutes with SafeLine (17.3 K⭐ on GitHub)