Effective Data Visualization Techniques in Data Science with Python

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp

Contents

This article was published as part of the Data Science Blogathon

Data visualization

Data visualization techniques involve the generation of graphic or pictorial representations. of data, form that leads you to understand the information in a given data set. This visualization technique aims to identify the patterns, trends, correlations and outliers of data sets.

23452dv-4991325

Benefits of data visualization

  • Patterns in business operations: Data visualization techniques help us determine patterns of business operations. Understanding the problem statement and identifying the solutions in terms of patterns and applied to eliminate one or more of the inherent problems.
  • Identify business trends and engage with data: Are Techniques help us identify market trends by collecting data on daily business activities and preparing trend reports, which helps to track how the company influences the market. So that we can understand the competition and customers. Certainly, this helps to have a long-term perspective.
  • Narration and decision making: Knowledge of storytelling from available data is one of the niche skills for business communication, specifically for the data science domain, who is playing a vital role. Using the best visualization this role can be enhanced much better and achieving the objectives of business problems.
  • Understand current business information and set goals: Businesses can understand business KPI information, find tangible goals and business strategy plans, so they could optimize the data for business strategy plans for ongoing activities.
  • Operational and performance analysis:
  • Increase the productivity of the manufacturing unit: With the help of visualization techniques, clarity of KPIs representing manufacturing unit productivity trends and guidance were to improve plant productivity.
80788benefits20of20data20visualization-8680171

Data visualization in data science

Data visualization techniques are the most important part of data science, there will be no doubt about it. And even in the data analysis space, data visualization plays an important role. We will discuss this in detail with the help of Python packages and how it helps during the flow of the Data Science process.. This is a very interesting topic for all scientists and data analysts..

I. Line graph

Line graph is a simple data visualization in python, which is available in Matplotlib.

Line charts are used to represent the relationship between two data X and Y on the respective axis. Let's see some samples

Sample #1
# importing the required libraries
import matplotlib.pyplot as plt
import numpy as np
#simple array
x = np.array([1, 2, 3, 4])
#genearting y values
y = x*2  
plt.plot(x, Y)
plt.show()
Sample #2
x = np.array([1, 2, 3, 4])
y = np.array([2, 4, 6, 8])
plt.plot(x, Y)
plt.xlabel("Time in Hrs") 
plt.ylabel("Distance in Km") 
plt.title("Time Vs Distance") 
plt.show()
34929line20chart-5022162

Line Chart always a linear relationship between the X and Y axes, we can see it in the image above.

II.Histogram

The histogram is the graphical representation of a distribution set of numerical data. It is a kind of bar graph with the X axis and the Y axis represents the ranges and the frequency of the intervals, respectively. How to read or represent this table.

Let's say the example, set of student marks in the ranges and frequencies shown below. Here we could understand exactly the range and cutoff frequency.

from matplotlib import pyplot as plt

import numpy as np

fig,ax = plt.subplots(1,1)

a = np.array([25,42,48,55,60,62,67,70,30,38,44,50,54,58,75,78,85,88,89,28,35,90,95])

ax.hist(a, bins = [20,40,60,80,100])

ax.set_title("Student's Score")

ax.set_xticks([0,20,40,60,80,100])

ax.set_xlabel('Marks Scored')

ax.set_ylabel('No. of Students')

plt.show()

45209histo-1007633

Histogram characteristics

  • the Histogram is used to get any unusual observations in the data set from giving.
  • Measured on a scale scale of numerical values ​​given with several data containers.
  • The Y axis represents the number of% occurrences in the data
  • The X-axis represents data distributions.

Show – This is similar to the histogram on the chart, but with additional features. And bringing Estimation of grain density (WHERE).

Joint plot – A combination of scatter and histogram.

import seaborn as sns
import matplotlib.pyplot as plt 
from warnings import filterwarnings
df = sns.load_dataset('tips')
sns.distplot(df['total_bill'], where = True, color="green", bins = 20)
sns.jointplot(x ='total_bill',color="green", y = 'tip', data = df)
31973displot20and20joint-5714872

III Pie chart

This is a very familiar graph and pie chart representing a statistic chart from a data series. This is commonly used in business presentations to represent orders, sales, Profits, losses, etc. It consists of portions of data that are part of the collection of the same set and differentiation by characters. Each of the cake slices is called a wedge with values ​​of different sizes.

This table is widely used to represent the composition collection.. Perfect for categorical data type.

from matplotlib import pyplot as plt
import numpy as np
Language = ['English', 'Spanish', 'Chinese',
        'Russian', 'Japanese', 'French']
data = [379, 480, 918, 154, 128, 77.2]
# Creating plot
fig = plt.figure(figsize =(10, 7))
plt.pie(data, labels = Language)
# show plot
plt.show()
46766pie20chart-3955699
import matplotlib.pyplot as plt
import numpy as np
y = np.array([35, 25, 25, 15])
mylabels = ["India", "UK", "UK", "German"]
myexplode = [0.2, 0, 0, 0]
plt.pie(Y, labels = mylabels, explode = myexplode)
plt.show()
14689pie20chart1-7792621

IV. area parcel

This is very similar to a line chart with fences surrounded by a boundary line of different colors.. Simple representation of the evolution of a numerical variable.

import matplotlib.pyplot as plt

days = [1, 2, 3, 4, 5]

raining = [7, 8, 6, 11, 7]

snow =  [8, 5, 7, 8, 13]

plt.stackplot(days, raining, snow,colors =['b', 'and'])

plt.xlabel('Days')

plt.ylabel('No of Hours')

plt.title('Representation of Raining and Snow wrt to Days')

plt.show()

15903area20plot-5890764

V. Scatter plots

Scatterplots are used to plot data points on both axes (horizontal and vertical) y represent how each axis correlates with each other. Mainly in the implementation of Data Science / Machine Learning and before the EDA process, we generally need to analyze how dependent and independent they align. It could be positive or negative or, sometimes, be scattered on the graph.

import matplotlib.pyplot as plt
x = [5,7,8,7,2,17,2,9,4,11,12,9]
y = [99,86,87,88,67,86,87,78,77,85,86,56]
plt.scatter(x, Y)
plt.show()
import matplotlib.pyplot as plt

x = [5,7,8,10,14,18,22,26]

y = [6,8,9,12,16,20,24,28]

plt.scatter(x, Y)

plt.show()

53461scatter20plots-9997184

WE. Hexbin portions

The Hexbins target is used to group the two sets of numeric values. Hexbins helps improve the visualization of scatter plots. Because for a larger data set, a scatter plot creates a handful of fuzzy points. We can improve this with Hexbins. Provides two rendering modes: 1. Coordinate list 2. Geospatial object.

import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(size=(1, 1000))
y = e.g. random.normal(size=(1, 1000))
plt.hexbin(x, Y, gridsize=15)
plt.hexbin(x,Y,gridsize=15, Mince 1 =, edgecolors="white")

plt.scatter(x,Y, s=2, c="orange")

plt.show()

71500hexbins20plots-5862199

VII. Heat map

A heat map is one of my favorite visualization techniques among the other graphics. basically, a set of variable correlations is represented by several shades of the same color. As usual, the darker tones in the graph represent the higher correlation values ​​than the lighter tone. this map would help data scientists discover how the target variable correlates with other dependent variables in the given data set. Less correlated variables can be removed for more detailed analysis, We could say that this helps us during the feature selection process. Then grouping them in X, And as our goal and followed by test and division of the train.

import seaborn as sn
import numpy as np
import pandas as pd
df=pd.DataFrame(np.random.random((7,7)),columns=['a','b','c','d','e','f','g'])
sn.heatmap(df)
sn.heatmap(df,annot=True,annot_kws ={'size':7})
13312heatmap-8158087

VIII. Box plot

A box plot is a type of graph that is often used in the data science life cycle., especially during explanatory data analysis (EDA). Representing the distribution of data in the form of quartiles or percentiles. Q1 represents the first quartile (percentile 25), Q2 is the second quartile (percentile 50 / median), Q3 represents the third quartile (Q3) and Q4 represents the fourth quartile or the largest value.

Using this graph, we were able to identify outliers very quickly and easily. This is a very effective plot among all plots. Therefore, after removing outliers, the dataset should be subjected to some kind of statistical test and adjusted for more detailed analysis.

#import matplotlib.pyplot as plt
np.random.seed(10)
one=np.random.normal(100,10,200)
two=np.random.normal(80, 30, 200)
three=np.random.normal(90, 20, 200)
four=np.random.normal(70, 25, 200)
to_plot=[one,two,three,four]
fig=plt.figure(1,figsize=(9,6))
ax=fig.add_subplot()
bp=ax.boxplot(to_plot)
fig.savefig('boxplot.png',bbox_inches="tight")
69597boxplot1-3419890

IX. Portion

A plot is another important graph in the data science life cycle during the EDA process., to analyze how characteristics are related to each other, as a miniature grid-based graphical representation along the X and Y axes, positively correlated or negatively correlated. . Then, obviously, we could eliminate negatively correlated, considering the positively corrected pairs and moving on to a more detailed analysis. This is very similar to the Heat Map, but here we could see the relationship with the naked eye. That's special here. I hope you can afford this. Again, this is the best way to go through the role selection process.

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
df = sb.load_dataset('iris')
sb.set_style("ticks")
sb.pairplot(df,hue="species",diag_kind = "where",kind = "scatter",palette = "husl")
plt.show()
66525pairplot-6803497

The line graph is always a linear relationship between the X and Y axes, we observe that the image above

X. Bar graphic

A bar chart or bar chart is generally a very familiar chart for presenting categorical data with rectangular bars. Can be drawn horizontally or vertically. this graph would represent the impact of the individual's category on the given data set. First look first. In the graph below, "America" ​​has much more impact than "Europe" and "Asia". This would lead to some observation on the data set and would focus on the statement of the problem.

fig, ax = plt.subplots(figsize = (5, 5))


sns.countplot(x = df_cars.origin.values, data=df_cars)

labels = [item.get_text() for item in ax.get_xticklabels()]

labels[0] = 'America'

labels[1] = 'Europe'

labels[2] = 'Asia'

ax.set_xticklabels(labels)

ax.set_title("Cars manufactured by Countries")

plt.show()
71849bar1-6848032

Univariante – Bivariate and multivariate analysis

Analysis of variants in the Data Science process, could be Univariate (O) Bi-variable (O) Multivariate.

  • Univariante: only one variable at a time.
  • Bi-variable: compare two variables.
  • Multivariate: compare more than two variables

You can very well reference the older models with the Graphics / Visualization that we have discussed from the beginning of the article. Just check it again. Certainly, you can understand the importance of these data visualization techniques.

Thank you for reading this article and I think it is useful for you.. and you can realize this when you opt for Data Science solution implementation before model selection. Even after all model evaluation and predictions are the result of comparisons. As shown below in the reference pictures.

11446avsp-7079390

Thanks! One more time. We will contact you with another interesting topic. Until then goodbye! Wow! – Shanthababu

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.