Key Python Packages for Data Science

Contents

Introduction

69487ds20lib-9770510

Especially Python libraries for data science, machine learning models are very interesting, easy to understand and absolutely that you can apply immediately and can feel the information of the data and realize / visualize the nature of the dataset.

Even complex algorithms can be implemented in two or three lines of code, all major math concepts are embedded within packages for implementation point of view.

Of course, this is something different and interesting than other programming libraries i have seen so far, That's the main reason why Python plays a vital role in the AI ​​space with this simplicity and robustness!! I think so! I realized, I thoroughly understood and enjoyed it.

What is a package in Python? A package is a collection of Piton modules and assemblies in one package. Once it matters in your notebook cells, can start using classes, methods, attributes, etc., but before that, you should need and use the package and import it into your archive / package.

69696package-3000770

Let's look at the key Python packages for data science and machine learning.

  1. Pandas
  2. NumPy
  3. Learn Scikit
  4. Matplotlib
  5. Seaborn

Pandas

25912pandas-2559970

Used primarily for structured data manipulations and operations. Pandas offers powerful data processing capabilities, I've never seen such wonderful features in my IT journey. Provides high performance, easy to use and applied in data structures and to analyze the data.

How could you install the Pandas library? this is very simple, run the following command on your Jupiter Notebook.

!pip install pandas

The Pandas library will install successfully!! Whats Next? play with this library.

The syntax to import Scikit into your NoteBook

import pandas as pd

Then, your Notebook is ready to extract all functions within pandas. let's do some things here.

Pandas have the following capabilities.

87156pandas20can20do-5630456

A) Series y DataFrame

The main components of pandas are Serie Y Data frame. Let's take a quick look at this. Series is nothing more than a dictionary and a collection of series, we could build the data frame by merging series, take a look at the following sample. you would understand better.

30191ssdf-6886762

Code creates series and data frames

import pandas as pd
Eno=[100, 101,102, 103, 104,105]
Empname= ['Raja', 'Babu', 'Kumar','Karthik','Rajesh','xxxxx']
Eno_Series = pd.Series(There) 
Empname_Series = pd.Series(Empname)
df = { 'Eno': Eno_Series, 'Empname': Empname_Series } 
employee = pd.DataFrame(frame)
employee

B. Load data into a data frame object

cereal_df = pd.read_csv("cereal.csv")
cereal_df.head(5)

C. Drop column from data frame object

cereal_df.drop(["type"], axis = 1, inplace = True)
cereal_df.head(5)
57946df_drop-7813281

D. Select rows from data frame object

cereal_df_filtered = cereal_df[cereal_df['rating'] >= 68]
cereal_df_filtered.head()

E. Group column in data frame

cereal_df_groupby = cereal_df.groupby('shelf')
#print the first entries
cereal_df_groupby.first()
13608groupby-7003004

F. Extract a row from the data frame

# return the value 
result = cereal_df.loc[0,'name']
result

Up to now, we have discussed multiple functionalities in the pandas library. There are many more.

NumPy

NumPy is considered one of the most popular machine learning libraries in Python, the best and most important feature of NumPy is the interface and Array manipulations.

Afraid of math while implementing your data science model / machine learning? Do not worry, NumPy makes complex mathematical implementations very simple functions. But remember to understand the requirements and use the package accordingly.

The syntax to import NumPy into your NoteBook

import numpy as np
94948numpy20can20do-3607293

Let's break down a few things here, how NumPy works magic with given data.

A. Simple matrix formation using NumPy (1-D, 2-D y 3D)

import numpy as np

#1-D arrays

arr1 = np.array([1, 2, 3, 4, 5])

print("1-D Array")

print(arr1)

print("===================")

#2-D arrays

print("2-D Array")

arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print(arr2)

print("===================")

#3-D arrays

print("3-D Array")

arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr3)

print("===================")

Production

1-D Array
[1 2 3 4 5]
===================
2-D Array
[[1 2 3]
 [4 5 6]]
===================
3-D Array
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
===================

B. Array Slicing usando NumPy

#Slicing in python means taking elements from given index range [start:end-1] /[start:end:step].
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print("Slicing at index 1 to 5")
print(arr[1:5])

Production

Slicing at index 1 to 5
[2 3 4 5]
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Output
[5 6 7]

We also have Negative Slicing :). That's so simple, we just have to mention [-x:-Y],

Why don't you try your own?

C. Matrix Shaping and Reshaping Using NumPy

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print("================================")
print("Shape of the array")
print(arr.shape)
print("================================")
Output
================================
Shape of the array
(2, 4)
================================
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print("Before Reshape the array")
print(arr)
print("================================")
newarr = arr.reshape(4, 3)
print("After Reshape the array")
print(newarr)
print("================================")
output 
Before Reshape the array
[ 1  2  3  4  5  6  7  8  9 10 11 12]
================================
After Reshape the array
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
================================

D. Array division using NumPy

arr = np.array([1, 2, 3, 4, 5, 6])
print("Splitting NumPy Arrays into 3 Arrays")
print("================================")
newarr = np.array_split(arr, 3)
print(newarr[0])
print(newarr[1])
print(newarr[2])
print("================================")
output
Splitting NumPy Arrays into 3 Arrays
================================
[1 2]
[3 4]
[5 6]

E.Sorting Array using NumPy

arr = np.array(['banana', 'cherry', 'apple'])
print("Splitting NumPy Arrays into 3 Arrays")
print("================================")
print(np.sort(arr))
print("================================")
output
Splitting NumPy Arrays into 3 Arrays
================================
['apple' 'banana' 'cherry']
================================

If you have started to play with data using NumPy ....

Certainly, needs more and more time … to understand the concepts, all are

extremely organized in this package. trust me!

Learn Scikit

28350scikit-learn-9813446

Scikit The Learn library is one of the richest libraries in the Python family, contains a large number of machine learning algorithms and other key performance-related libraries. Python Scikit-learn allows users to perform various specific machine learning tasks. To work, should work together with the SciPy and NumPy libraries, this is something internal, anyway, Keep it in mind. Few algorithms here for your opinions.

  1. Regression
  2. Classification
  3. Grouping
  4. Model selection
  5. Dimensionality reduction

The syntax to import Scikit into your NoteBook

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

Python display packages

Matplotlib and Seaborn Libraries

99166visual-8676075

Python provides 2D graphics functions with the Matplotlib library. this is very simple and easy to understand. you can achieve it with 1 O 2 lines. Even 3D visualization is there too.

The syntax for importing Scikit into your notebook

import matplotlib.pyplot as plt

import seaborn as sns

Hope you have worked on various charts in Excel spreadsheet and other BI tools. But in python, internal visualization packages provide very high quality charts and tables.

Matplotlib y Seaborn

Matplotlib is one of the main and basic visualization packages, which provides histograms (Frequency level), Bar graph (Univariate and bivariate plotting), Scatter plot (Grouping), etc.,

51723few20glimpse20from20matplotlib-5567073

Rich and deluxe data visualization library from Seaborn. Provides a high-level interface for drawing attractive and informative statistical charts. Box plots (Data distribution with different quartiles), Violin Plots (Data distribution Y Probability density), Bar charts (Comparisons between categorical characteristics), Heat map (Feature mapping in terms of matrix representation), Word cloud (Visual representation of text data)

Seaborn – Histogram

import seaborn as sb
from matplotlib import pyplot as plt
df = sb.load_dataset('iris')
sb.distplot(df['petal_length'],kde = False)
plt.show()
86755his-3632578

Seaborn – Box plot

df = sb.load_dataset('iris')
sb.boxplot(x = "species", y = "petal_length", data = df)
plt.show()
73185boxplot-3429144

Seaborn – Violinplot

sdf = sb.load_dataset('tips')
sb.violinplot(x = "day", y = "total_bill", data=df)
plt.show()
52176violin-7990378

Then, All these libraries are helping us build a good model and play with the data!!

But always remember, before using the induvial packages, you need to understand the need and requirements of the package and then import it into your archive / pack and play with it.

34112seaborn20collection1-4026892

Hope you now have the feeling and some level of detail about Python packages for data science. We will see more detailed concepts in the coming days!! Thanks for your time!

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.