Excel and other spreadsheets are ubiquituous in business because they let you create simple graphics. If you want to take your reports and presentations to the next level, you might want to consider learning Python to create powerful visualizations that will help you stand out from the crowd.
Initial setup
Creating the Python toolbox
To make plots in Python, you’ll have to set up an environment with the right packages. These include:
While they’re not libraries, IPython and Jupyter will make your life manipulating data with Python much easier. IPython enhances interactive Python, while Jupyter Notebooks make it easier to keep track of your work and share it with others.
It’s best to use a tool that can manage your Python environment to install these tools. My favorite tool these days is Pixi. You can install Pixi from the instructions on the Pixi website for your terminal-based system, including macOS, Linux, and PowerShell on Windows.
Once you’ve installed Pixi you can install the environment. Because you’ll likely want these at your fingertips, you can install them into the global environment:
pixi global install --environment graphics --expose jupyter --expose ipython jupyter numpy pandas seaborn matplotlib ipython
This will install Jupyter, IPython, NumPy, Seaborn, and Matplotlib. Since Matplotlib is a dependency for Seaborn, it would have likely been installed anyway, but we want to make sure it’s available, since we’ll issue some commands directly to Matplotlib. The --expose option tells Pixi to make IPython’s executable available to Jupyter.
Python programming basics
Trivia challenge
Think you know your loops from your lists? Put your Python knowledge to the test.
SyntaxData TypesFunctionsLoopsBasics
Which of the following is the correct way to print ‘Hello, World!’ in Python 3?
Correct! In Python 3, print is a function, so it requires parentheses. This is actually one of the key differences from Python 2, where print was a statement and didn’t need them.
Not quite. The correct answer is print(‘Hello, World!’). Python 3 treats print as a function, requiring parentheses — unlike Python 2, where you could write print without them.
What data type does the expression type(3.14) return in Python?
Correct! Python uses the float type to represent decimal numbers. Unlike some other languages, Python doesn’t have a separate ‘double’ type — floats in Python are actually double-precision by default.
Not quite. The answer is float. Python uses float to represent decimal numbers, and despite being double-precision under the hood, there’s no separate ‘double’ type in Python like you’d find in Java or C++.
What symbol is used to write a single-line comment in Python?
Correct! The hash symbol (#) marks a single-line comment in Python. Anything written after it on that line is ignored by the interpreter, making it perfect for notes and explanations in your code.
Not quite. Python uses the # symbol for single-line comments. The // style is used in languages like JavaScript and C++, while — is common in SQL and Lua.
Which of the following Python data structures is immutable?
Correct! Tuples are immutable, meaning their contents cannot be changed after creation. This makes them useful for storing data that shouldn’t be modified, and they can even be used as dictionary keys — unlike lists.
Not quite. The answer is tuple. Tuples are immutable in Python, so once created, you can’t add, remove, or change their elements. Lists, sets, and dictionaries are all mutable and can be modified freely.
What will the following code print?
for i in range(3):
print(i)
Correct! Python’s range(3) generates numbers starting from 0 up to, but not including, 3. So it produces 0, 1, and 2. This zero-based indexing is a fundamental concept in Python and most modern programming languages.
Not quite. The answer is 0, 1, 2. Python’s range() function starts at 0 by default and stops before the given number, so range(3) produces 0, 1, and 2 — not 1, 2, 3.
Which keyword is used to define a function in Python?
Correct! Python uses the def keyword to define functions. It’s short for ‘define,’ and it’s followed by the function name, parentheses for parameters, and a colon to begin the function body.
Not quite. The correct keyword is def. Python keeps things concise — def is short for ‘define,’ and it’s the only keyword used to create standard functions. The word ‘function’ itself is not a keyword in Python.
What is the output of the following expression in Python? bool(0)
Correct! In Python, 0 is considered ‘falsy,’ so bool(0) returns False. Other falsy values include empty strings, empty lists, None, and 0.0. Any non-zero number evaluates to True.
Not quite. The answer is False. Python treats 0 as a falsy value, so converting it to a boolean gives False. This concept of truthy and falsy values is widely used in Python conditional expressions.
What does the ‘len()’ function do in Python?
Correct! The len() function returns the number of items in an object, such as the number of characters in a string or the number of elements in a list. It’s one of Python’s most frequently used built-in functions.
Not quite. The answer is that len() returns the number of items in an object. For example, len(‘Python’) returns 6, and len([1, 2, 3]) returns 3. For finding the largest element, you’d use the max() function instead.
Your Score
/ 8
Thanks for playing!
Line charts
Plotting a time series
You can create line charts with Seaborn. First, you’ll have to import the libraries you just installed using IPython or Jupyter:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_theme()
import matplotlib.pyplot as plt
The first three lines import NumPy, pandas, and Seaborn using shortened names.
We can demonstrate a line chart that represents a time series using one of the built-in datasets that Seaborn includes. This includes a list of airline flights and passenger numbers from 1949 through 1960.
First, we’ll load the dataset:
flights = sns.load_dataset('flights')
This creates a pandas DataFrame, which we can examine the first few lines of with the head() method:
flights.head()
We can then create a line chart of the year on the x-axis followed by the number of passengers on the y-axis:
sns.lineplot(x='year',y='passengers',data=flights)
The chart will appear in another window.
The chart will display in a separate window.
Bar charts
You can also create bar charts. We’ll call up another dataset, from a waiter in a New York City restaurant who recorded the amount on the bill along with the tip.
tips = sns.load_dataset('tips')
We’ll create a bar chart that shows the total bill across several days:
sns.catplot(x='day',y='total_bill',kind='bar',data=tips)
Scatterplot/regression
Finding trends in your data
Another useful kind of plot is scatterplot and regression, or showing a trendline. Since these are so common in statistics, data science, and business, they’re easy to create with Seaborn.
We’ll create a scatterplot of the tip vs. the total bill, with the total bill on the x-axis and the tip on the y-axis. The tip will be the dependent variable and the the total bill will be the independent variable. We want to see if there’s a relationship between the total bill and the tip.
sns.relplot(x='total_bill',y='tip',data=tips)
You might notice that the tip seems to rise along with the total bill. You could draw a straight line that goes up from the left to the right. In other words, there appears to be a positive linear relationship between the two. The larger the bill, the bigger the tip is.
We can also draw that straight line, or the linear regression line, through this scatterplot. The code is very similar to the code we used for the scatterplot.
sns.regplot(x='total_bill',y='tip',data=tips)
Note that you won’t be able to get the values of the equation for the regression line, such as the slope and the intercept to form the classic y = mx + b equation you might remember from high school algebra. You’ll need to use another library, such as SciPy or statsmodels, which will give you the values for the equation.
Adjusting titles and saving
Ge your charts ready for prime time
While the plots look good on their own, they’re not quite ready for inserting into your charts or presentations. You’ll likely need to change the labels on the axes so that they look more comprehensible on their own.
You’ll make calls not to Seaborn but to the underlying Matplotlib library. Suppose we wanted to publish the tip vs. bill regression plot. Here’s how we would clean it up if we wanted to put it in a report. We would use the code for the regression plot earlier but use Matplotlib to add a title and modify the labels on the axes.
sns.regplot(x='total_bill',y='tip',data=tips)
plt.title("Tip vs. total bill in a New York City restaurant")
plt.xlabel("Total bill (USD)")
plt.ylabel("Tip (USD)")
plt.show()
This will add a title and get rid of the underscore in the axes, as well as clarify that the amounts are US dollars.
To save your plot, you can click the Save icon in the window that’s popped up, or you can use this command:
plt.savefig('/path/to/file')
You can then drop this into your document or presentation. Matplotlib supports many popular formats, including PNG.
It’s easy to create great-looking charts in Python
It might take a little time to learn Python, but with a little effort, you can create publication-quality plots that can liven up your next presentation or report. Shouldn’t you put as much effort into the visualizations as you do in making sure the words and numbers are correct?
- Operating System
-
Ubuntu Linux 22.04 LTS
- CPU
-
13th Gen Intel Core i7-1360P
- GPU
-
Intel Iris Xe Graphics
- RAM
-
16GB DDR5
- Storage
-
512GB SSD
- Weight
-
2.71 lbs
The Dell XPS 13 Plus with Linux combines powerful hardware and a great screen in a lightweight, great-looking chassis to create a fantastic Linux laptop.
