Seaborn for data visualization | A Beginner's Guide to Seaborn

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp


This article was published as part of the Data Science Blogathon.


A step-by-step guide to get started with Seaborn!!

Si matplotlib “try to make easy things easy and difficult things possible”, seaborn tries to make a well defined set of difficult things easy too.

Seaborn freshness:

Seaborn's greatest strengths are its diversity of plotting capabilities. It allows us to make complicated graphics even in a single line of code!!

In this tutorial, we will use three libraries to do the job: Matplotlib, Seaborn, Pandas. If you are a complete beginner in Python, I suggest you start and get a little familiar with Matplotlib and Pandas.

If you follow this tutorial exactly, you will be able to create beautiful graphics with these three libraries. Then, you can use my code as a template for future visualization tasks in the future.

Let's start our journey at sea with the famous Pokémon dataset. Before starting, I highly recommend that you write your own basecodes for each chart and try experimenting with charts.

You can find the Pokémon dataset at Kaggle. But nevertheless, to facilitate your trip, I have shortened and cleaned this version of the dataset.

You can download the data set here:

My super saver: I would like to mention a resource that is always my super savings when I am stuck. .

Let's start now:

We will start with importing the necessary libraries:

#importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Read the CSV file

data = pd.read_csv(“Pokemon.csv”,encoding= ‘unicode_escape’)

I changed the utf8 codec error by defining a different codec pack in the read_csv command ().

Our data looks like this ....




The name of the columns does not clearly simplify their purpose. It is important to know the dataset before working on it.

Here is the simplified description of the dataset for you.

This data set includes 150 Pokémon, it's about pokemon games (NO Pokémon or Pokémon Go cards).

In this data set, have 150 rows and 13 columns.

Description of the columns:

# ID for each pokemon
# Name: Name of each pokemon
# Guy 1: each pokemon has a type, this determines the weakness / resistance to attacks
# Guy 2: Some Pokémon are dual-type and have 2
# Total: sum of all statistics that come after this, a general guide to how strong a Pokémon is
# HP: hit points, or health, defines how much damage a pokemon can take before fainting
# Attack: the base mod for normal attacks (for instance, Scratch, Punch)
# Defending: base damage resistance against normal attacks.
# SP Atk: special attack, the base modifier for special attacks (for instance, fire explosion, bubble beam)
# SP Def: base damage resistance against special attacks
# Speed: determines which Pokémon attacks first in each round.
# Stage: Generation number
#Legendary: true if it is a legendary pokemon, false yes no

I have renamed the column names to make more sense of our plot and for clarity of mind. Although it is optional, I highly recommend that you do so to eliminate any chance of confusion.

data.rename(columns = {“#”:”No.”,”Type 1":”Pokemon_Type”,”Type 2":”PokemonType2",’Total’:’Sum of Attack’,”HP”:”Hit Points”,”Attack” : “Attack Strength”, “Defense”:”Defensive Strength”,”Sp. Atk”:”Special Attack Stenth”,”Sp. Def”:”Special Defense Strength”,”Stage”:”Generation”}, inplace = True)data.head()

My exit now looks:


Let's start visualization with the simple ones, the distribution graph.

Distribution plots:

A distribution plot show a distribution and rank of a set of numeric values ​​plotted against a dimension. Histograms allow you to plot the distributions of numeric variables.

Could have used “Data.hist (figsize = (12,10), bins = 20)” , but since not all columns in this database have numeric values. Therefore, I have to plot individual distribution plots.

sns.distplot(x=data[“Sum of Attack”],color=”Orange”,where = True,rug = True);

Distribution plot output: Pokémon attack sum

The seaborn plot function plots a histogram with a density curve. We can eliminate the density using the option kde = ”False”. Check the presence of rugs using rug = ”True”.

There are many alternative ways to plot a histogram in Python:

sns.histplot(x=data[“Sum of Attack”],color=”Green”);
Outcome: Sum of Pokémon Attacks

Another way is: using plt.hist ()

plt.hist(x=data["Sum of Attack"],color="Red",bins=20);
Departure: histogram, Matplotlib

Therefore, there are many ways to graph distributions. All functions pyplot.hist, seaborn.coOutuntplot Y seaborn.displot they act as wrappers for a matplotlib bar chart and can be used if manually plotting such a bar chart is considered too cumbersome.

  • For discrete variables, a seaborn.countplot it is more convenient.
  • For continuous variables: pyplot.hist O seaborn.distplot They are used.

Joint distribution plots:

Joint distribution diagrams combine information from scatterplots and histograms to give us detailed information for bivariate distributions..

sns.jointplot(x=data[“Sum of Attack”],y = data[“Defensive Strength”],color=”Red”);

Departure: Jointplot

Density charts:

Density plots show the distribution between two variables.

sns.kdeplot(x=data[“Sum of Attack”],y = data[“Defensive Strength”])
13rfvxuah1m7x_etcoadunq-5218802Departure: Density graph

Bar graphic

Bar charts help us visualize the distributions of categorical variables: Countplot is a type of bar chart.

1lc4mpdykjuhs8gx4i7j4na-9020691Departure: Bar graphic

Heat map

Heat map helps us visualize matrix data in the form of hot and cold spots. Warm colors indicated the sections with the most visitor interaction.

sns.heatmap(data.corr());# Rotate x-labels with the help of matplotlib
1frw2vrtgip5479a0bfegog-1640678Departure: heat map

Scatter plot:

A scatter plot (also know as dispersion graphic, Scatter plot) uses points to represent values ​​for two different numeric variables. The position of each point on the horizontal and vertical axis indicates values ​​for an individual data point.

Scatter plot are used to observe relationships between variables.

I have compared the attack and defense statistics of our Pokémon with the help of scatter diagrams.

1t7kwpsrtx-m2pz7tpcrc2a-6289070Departure: scatter plot

Seaborn does not have a dedicated scatterplot function, so we see a diagonal line (regression line) here by default.

Fortunately, seaborn helps us modify the plot:

  • fit_reg = False is used to remove the regression line
  • hue = ‘Stage’ used to color points by a third variable value. Thus, allows us to express the third dimension of information using color.

Here I have the Pokémon evolution stage as the third variable!!

#Tweaking with scatter plotsns.lmplot(x=’Attack Strength’, y=’Defensive Strength’, data = data,
 fit_reg = False, #Deleting regression line
 hue=”Generation”); #Separating as per pokemon generation
1khildisnqqzx7wqnlb8w3a-2590333Outcome: fitted scatter plot

More of the density falls on the mark 40-120, I will alter the axis limits with the help of matplotlib:

sns.lmplot(x=’Attack Strength’, y=’Defensive Strength’, data = data,
fit_reg = False, #Deleting regression line
hue=”Generation”); #Separating as per pokemon generationplt.ylim(20,130);

Now we can see a better and more focused graph!!

1jndlxwrmufgwu3o4e7lrwg-5414401Outcome: best scatter plot

Box plot

A box plot is used to represent groups of numerical data through their quartiles.

Box plots may also have lines extending from the boxes indicating variability outside of the box. upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker plot

We can delete the column “Attack sum” since we have individual statistics. We can also remove the columns “Generation” Y “Legendary” because they are not fighting the statistics.

plt.figure(figsize=(15,7));# Pre-format DataFrame
stats_data = data.drop([‘Sum of Attack’, ‘Generation’, ‘Legendary’], axis=1);
# New boxplot using stats_df
 showfliers=False); #Removing outlierssns.set_style(“whitegrid”)
1eo6ldo0simjfsrnkj8bz7a-9034159Departure: Box plot

Remember to keep the size of the figure before plotting the graph.

Violin frames

Now I will trace the plot of the violin.

Fiddle plots are alternatives to box plots. Show the distribution (through the thickness of the violin) instead of just summary statistics.

Here I have shown the distribution of Attack by primary type of Pokémon

sns.violinplot(x=data.Pokemon_Type, y = data[“Attack Strength”]);
1qnzsrsrsqmf1c9hro9vvjw-9594682Departure: Violin plot

As you can see, Dragon types tend to have higher attack stats than Ghost types, but they also have a greater variation.

Now, Pokémon fans may find something quite jarring in that plot: The colors are absurd. Why is the Grass type pink or the Water type orange?? We must fix this immediately!!

Fortunately, Seaborn allows us to configure custom color palettes. We can simply create an order Python list hexadecimal color values.

I have used Bulbapedia to create a new color palette.

# using Bulbapedia to create a new color palette:#Bulbapedia : = [‘#78C850’, # Grass
 ‘#F08030’, # Fire
 ‘#6890F0’, # Water
 ‘#A8B820’, # Bug
 ‘#A8A878’, # Normal
 ‘# A040A0’, # Poison
 ‘#F8D030’, # Electric
 ‘#E0C068’, # Ground
 ‘#EE99AC’, # Fairy
 ‘#C03028’, # Fighting
 ‘#F85888’, # Psychic
 ‘#B8A038’, # Rock
 ‘#705898’, # Ghost
 ‘#98D8D8’, # Ice
 ‘#7038F8’, # Dragon

Making changes to the plot of the violin according to the color of the type of Pokémon:

 y = data[“Attack Strength”],
 palette = pkmn_type_colors);
1-tvwskx4h42-98x_zjjnpq-2443527Outcome: Best Violin Plot 🙂


As you have seen, fiddle charts are great for visualizing distributions.

But nevertheless, since we only have 150 Pokémon in our dataset, we may want to just show each point. That's where the swarm plot come in. This visualization will show each point, while “appeal” those with similar values.

sns.swarmplot(x=data.Pokemon_Type,y = data[“Attack Strength”],palette=pkmn_type_colors);
1knf0pcuh9euqlto6qy353g-2332982Swarmplot: Pokémon Type Vs Attack Force

This looks good, but for better images, We can combine these two! After all, show the same information.

Overlapping graphics

plt.figure(figsize=(10,10))sns.violinplot(x=data.Pokemon_Type, y = data[“Attack Strength”],
 palette = pkmn_type_colors);sns.swarmplot(x=”Pokemon_Type”,
y=”Attack Strength”,
data = data,
color=’black’, #making points black
alpha=0.5);plt.title(“Attacking Strength as per Pokemon’s Type”);
19wpmqdoiwqqrzvx4paraxg-3042779Overlapping graphics

Points to consider:

inner = None: remove the bars inside the violins

alpha = 0.5: makes the dots slightly transparent: remember that the alpha value must be float, don't keep it on “”

You can find the references for the navy color here:

Factor charts

Factor charts make it easy to separate charts by categorical classes.

factplot= sns.factorplot(x="Pokemon_Type",y ="Attack Strength",data = data,hue="Generation",col="Generation",kind="swarm");factplot.set_xticklabels(rotation=-45)
14qb1b2gfb2csf8dbhbsyxg-8854765Factorplot: for separate categorical classes

Fast notes:

  • plt.xticks (rotation = -45): it doesn't work because it only rotates the last graph
  • Need to use: set_xticklabels
Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.