Blog Posts

Prettifying pandas DataFrames

Blog: Think Data Analytics Blog

Did you know that we can prettify pandas DataFrames by accessing the .style attribute? Here’s an example where we styled a DataFrame such that it resembles a heatmap:

Image by author | Correlation matrix before and after styling

After styling, it looks more obvious and intuitive to see positive and negative correlations as well as the strength of correlations. By colour-coding, we can make it easier to interpret and analyse the DataFrame. In this post, I will show 4 useful ways to prettify your DataFrame.

Photo by Anna Kolosyuk on Unsplash

0. Data 📦

We will use the penguins dataset for this post. Let’s import libraries and the data:

import numpy as np
import pandas as pd
pd.options.display.precision = 2
from seaborn import load_dataset# Load sample data
columns = {'culmen_length_mm': 'length',
'culmen_depth_mm': 'depth',
'flipper_length_mm': 'flipper',
'body_mass_g': 'mass'}
df = load_dataset('penguins').rename(columns=columns)
df.head()

When loading the data, column names are renamed for brevity.

1. Prettifying DataFrames ✨

In order to style DataFrames, we need to access the .style attribute which returns a Styler object:

type(df.style)

This Styler object creates an HTML table which can be further styled using CSS. In the sections to come, we will be using Styler object’s built-in methods as well as a little bit of CSS syntax to customise the formatting. We don’t need to know CSS to style DataFrames as we will be making only a few CSS references. For that, cheatsheets like this can help us get the basics.

In the following sections, we will be chaining multiple methods one after another. This makes the code very long. To format the code in a more readable way, we will break the long code over a few lines and use () to wrap the code.

1.1. Gradients 🌈

Let’s start this section by looking at how the previous heatmap was created. We will use .background_gradient() method to create a heatmap for correlation matrix.

correlation_matrix = df.corr()
(correlation_matrix.style
.background_gradient(cmap='seismic_r', axis=None))

Adding background gradient takes only an extra line of code. By passing axis=None, the colour gradients are applied along the entire table rather than within a specific axis. The name of the desired colour palette is passed onto the cmap parameter. For this parameter, we can use any Matplotlib colourmap. Here’s a useful tip for colourmaps: If you ever need to flip the colour scale, adding _r suffix to the colour map name will do the trick. For instance, if we used 'seismic' instead of 'seismic_r', negative correlations would have been blue and positive correlations would have been red.

The previous example doesn’t look identical to the example shown at the beginning of this post. It needs a few more customisations to look the same:

(correlation_matrix.style
.background_gradient(cmap='seismic_r', axis=None)
.set_properties(**{'text-align': 'center', 'padding': '12px'})
.set_caption('CORRELATION MATRIX'))

We center-aligned the values ({'text-align': 'center'}) and increased the row height ({'padding': '12px' ) with .set_properties(). Then, we added a caption above the table with .set_caption(). In this example, we have applied colour gradients to the background. We can also apply colour gradients to the text with .text_gradient():

(correlation_matrix.style
.text_gradient(cmap='seismic_r', axis=None))

If useful, we can chain both types of gradients as well:

(correlation_matrix.style
.background_gradient(cmap='YlGn', axis=None)
.text_gradient(cmap='YlGn_r', axis=None))

Before we wrap up this section, I want to show one more useful example. Let’s imagine we had a simple confusion matrix:

# Create made-up predictions
df['predicted'] = df['species']
df.loc[140:160, 'predicted'] = 'Gentoo'
df.loc[210:250, 'predicted'] = 'Adelie'# Create confusion matrix
confusion_matrix = pd.crosstab(df['species'], df['predicted'])
confusion_matrix

We can do a bit of make-over to make it more useful and pretty:

(confusion_matrix.style
.background_gradient('Greys')
.set_caption('CONFUSION MATRIX')
.set_properties(**{'text-align': 'center',
'padding': '12px',
'width': '80px'})
.set_table_styles([{'selector': 'th.col_heading',
'props': 'text-align: center'},
{'selector': 'caption',
'props': [('text-align', 'center'),
('font-size', '11pt'),
('font-weight', 'bold')]}]))

This looks pretty, useful and minimalistic. Don’t you love the look of this confusion matrix?

Submit Guest Post Big Data

Since we familiarised with the first 5 lines of the code in the previous examples, let’s understand what the remaining code is doing:
 .set_properties(**{'width': '80px'}): to increase column width
 .set_table_styles([{'selector': 'th.col_heading', 'props': 'text-align: center'}]): to align column headers in center
 .set_table_styles([{'selector': 'caption', 'props': [('text-align', 'center' ), ('font-size', '11pt'), ('font-weight', 'bold')]}]): to center-align caption, increase its font size and bold it.

1.2. Colour bars 📊

Now, let’s see how to add data bars to the DataFrame. We will first create a pivot table, then use .bar() to create data bars:

# Create a pivot table with missing data
pivot = df.pivot_table('mass', ['species', 'island'], 'sex')
pivot.iloc[(-2,0)] = np.nan# Style
pivot.style.bar(color='aquamarine')

This can be styled further just like in the previous examples:

(pivot.style
.bar(color='aquamarine')
.set_properties(padding='8px', width='50'))

Previously we got familiar with this format: .set_properties(**{'padding': '8px', 'width': '50'}). The code above shows an alternative way to pass your arguments to .set_properties().

If you have positive and negative values, you can format the data as follows by passing two colours (color=['salmon', 'lightgreen']) and aligning the bars in the middle (align='mid'):

# Style on toy data
(pd.DataFrame({'feature': ['a', 'b', 'c', 'd', 'e', 'f'],
'coefficient': [30, 10, 1, -5, -10, -20]}).style
.bar(color=['salmon', 'lightgreen'], align='mid')
.set_properties(**{'text-align': 'center'})
.set_table_styles([{'selector': 'th.col_heading',
'props': 'text-align: center'}]))

Here, we also made sure to center align the column headers and the values.

1.3. Highlights 🔆

There are times when highlighting values based on conditions can be useful. In this section, we will learn about a few functions to highlight special values.

Firstly, we can highlight minimum values from each column like this:

pivot.style.highlight_min(color='pink')

There’s an equivalent function for maximum values:

pivot.style.highlight_max(color='lightgreen')

We can chain these highlight functions together like this:

(pivot.style
.highlight_min(color='pink')
.highlight_max(color='lightgreen'))

There is also a function for highlighting missing values. Let’s add it to the previous code snippet:

(pivot.style
.highlight_min(color='pink')
.highlight_max(color='lightgreen')
.highlight_null(null_color='grey'))

These built in-functions are quite easy to use, aren’t they? Let’s look at two more functions before wrapping up this section. We can highlight values between a range like below:

pivot.style.highlight_between(left=3500, right=4500, color='gold')

We can also highlight quantiles:

pivot.style.highlight_quantile(q_left=0.7, axis=None, 
color='#4ADBC8')

Here, we’ve highlighted the top 30%.

We have used a few different colours so far. If you are wondering what other colour names you could use, check out this resource for colour names. As shown in the example above, you can also use hexadecimal colours which will give you access to a wider range of options (over 16 million colours!). Here’s my favourite resource to explore hexadecimal colour code.

1.4. Custom colour-code 🎨

In this last section, we will look at a few other ways to colour-code DataFrames using custom functions. We will use the following two methods to apply our custom styling functions:
 .applymap(): elementwise
 .apply(): column/row/tablewise

Elementwise application: .applymap()

Let’s create a small numerical data by slicing the top 8 rows from the numerical columns. We will use a lambda function to colour values above 190 as blue and the rest as grey:

df_num = df.select_dtypes('number').head(8)
(df_num.style
.applymap(lambda x: f"color: {'blue' if x>190 else 'grey'}"))

Let’s look at another example:

green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
(df_num.style
.applymap(lambda value: green if value>190 else pink))

We can convert the lambda function into a regular function and pass it to .applymap():

def highlight_190(value):
green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
return green if value > 190 else pinkdf_num.style.applymap(highlight_190)

Row/Column/Tablewise application: .apply()

Let’s see how we could do the same formatting using .apply():

def highlight_190(series):
green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
return [green if value > 190 else pink for value in series]df_num.style.apply(highlight_190)

We can also chain them just like the previous functions:

(df_num.style
.apply(highlight_190)
.applymap(lambda value: 'opacity: 40%' if value<30
else None))

It’s useful to know how to use both .apply() and .applymap(). Here’s an example where we can use .apply() but not .applymap():

def highlight_above_median(series):
is_above = series>series.median()
above = 'background-color: lightgreen'
below = 'background-color: grey; color: white'
return [above if value else below for value in is_above]
df_num.style.apply(highlight_above_median)

We find the median value by each column and highlight values higher than median in green and the rest in grey. We can also style the entire column based on conditions with .apply():

def highlight(data):
n = len(data)
if data['sex']=='Male':
return n*['background-color: lightblue']
if data['sex']=='Female':
return n*['background-color: lightpink']
else:
return n*['']df.head(6).style.apply(highlight, axis=1).hide_index()

Here, we have hidden DataFrame’s indices with .hide_index() for a cleaner look. If needed, you can also hide columns with .hide_columns() as well.

Lastly, most of these functions we looked at in this post take optional arguments to customise styling. The following two arguments are common and quite useful to know:
axis for along which axis to operate: columns, rows or the entire table
◼ subset to select a subset of columns to style.

Photo by Lucas Benjamin on Unsplash

Hope you enjoyed learning about useful ways to prettify your DataFrames by colour-coding it. Styled DataFrames can help explore and analyse the data more easily and make your analysis more interpretable and attractive. If you are keen to learn more about styling, check out this useful documentation by pandas.

Original Source

The post Prettifying pandas DataFrames appeared first on Big Data, Data Analytics, IOT, Software Testing, Blockchain, Data Lake – Submit Your Guest Post.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/prettifying-pandas-dataframes/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×