Introduction
Especially Python libraries for data science, machine learning models are very interesting, easy to understand and absolutely that you can apply immediately and can feel the information of the data and realize / visualize the nature of the dataset.
Even complex algorithms can be implemented in two or three lines of code, all major math concepts are embedded within packages for implementation point of view.
Of course, this is something different and interesting than other programming libraries i have seen so far, That's the main reason why Python plays a vital role in the AI space with this simplicity and robustness!! I think so! I realized, I thoroughly understood and enjoyed it.
What is a package in Python? A package is a collection of Piton modules and assemblies in one package. Once it matters in your notebook cells, can start using classes, methods, attributes, etc., but before that, you should need and use the package and import it into your archive / package.
Let's look at the key Python packages for data science and machine learning.
- Pandas
- NumPy
- Learn Scikit
- Matplotlib
- Seaborn
Pandas
Used primarily for structured data manipulations and operations. Pandas offers powerful data processing capabilities, I've never seen such wonderful features in my IT journey. Provides high performance, easy to use and applied in data structures and to analyze the data.
How could you install the Pandas library? this is very simple, run the following command on your Jupiter Notebook.
!pip install pandas
The Pandas library will install successfully!! Whats Next? play with this library.
The syntax to import Scikit into your NoteBook
import pandas as pd
Then, your Notebook is ready to extract all functions within pandas. let's do some things here.
Pandas have the following capabilities.
A) Series y DataFrame
The main components of pandas are Serie Y Data frame. Let's take a quick look at this. Series is nothing more than a dictionary and a collection of series, we could build the data frame by merging series, take a look at the following sample. you would understand better.
Code creates series and data frames
import pandas as pd Eno=[100, 101,102, 103, 104,105] Empname= ['Raja', 'Babu', 'Kumar','Karthik','Rajesh','xxxxx'] Eno_Series = pd.Series(There) Empname_Series = pd.Series(Empname) df = { 'Eno': Eno_Series, 'Empname': Empname_Series } employee = pd.DataFrame(frame) employee
B. Load data into a data frame object
cereal_df = pd.read_csv("cereal.csv")
cereal_df.head(5)
C. Drop column from data frame object
cereal_df.drop(["type"], axis = 1, inplace = True)
cereal_df.head(5)
D. Select rows from data frame object
cereal_df_filtered = cereal_df[cereal_df['rating'] >= 68] cereal_df_filtered.head()
E. Group column in data frame
cereal_df_groupby = cereal_df.groupby('shelf')
#print the first entries cereal_df_groupby.first()
F. Extract a row from the data frame
# return the value result = cereal_df.loc[0,'name'] result
Up to now, we have discussed multiple functionalities in the pandas library. There are many more.
NumPy
NumPy is considered one of the most popular machine learning libraries in Python, the best and most important feature of NumPy is the interface and Array manipulations.
Afraid of math while implementing your data science model / machine learning? Do not worry, NumPy makes complex mathematical implementations very simple functions. But remember to understand the requirements and use the package accordingly.
The syntax to import NumPy into your NoteBook
import numpy as np
Let's break down a few things here, how NumPy works magic with given data.
A. Simple matrix formation using NumPy (1-D, 2-D y 3D)
import numpy as np #1-D arrays arr1 = np.array([1, 2, 3, 4, 5]) print("1-D Array") print(arr1) print("===================") #2-D arrays print("2-D Array") arr2 = np.array([[1, 2, 3], [4, 5, 6]]) print(arr2) print("===================") #3-D arrays print("3-D Array") arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) print(arr3) print("===================")
Production
1-D Array [1 2 3 4 5] =================== 2-D Array [[1 2 3] [4 5 6]] =================== 3-D Array [[[1 2 3] [4 5 6]] [[1 2 3] [4 5 6]]] ===================
B. Array Slicing usando NumPy
#Slicing in python means taking elements from given index range [start:end-1] /[start:end:step]. arr = np.array([1, 2, 3, 4, 5, 6, 7]) print("Slicing at index 1 to 5") print(arr[1:5])
Production
Slicing at index 1 to 5 [2 3 4 5]
arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[4:]) Output [5 6 7]
We also have Negative Slicing :). That's so simple, we just have to mention [-x:-Y],
Why don't you try your own?
C. Matrix Shaping and Reshaping Using NumPy
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) print("================================") print("Shape of the array") print(arr.shape) print("================================") Output ================================ Shape of the array (2, 4) ================================ arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) print("Before Reshape the array") print(arr) print("================================") newarr = arr.reshape(4, 3) print("After Reshape the array") print(newarr) print("================================") output Before Reshape the array [ 1 2 3 4 5 6 7 8 9 10 11 12] ================================ After Reshape the array [[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] ================================
D. Array division using NumPy
arr = np.array([1, 2, 3, 4, 5, 6]) print("Splitting NumPy Arrays into 3 Arrays") print("================================") newarr = np.array_split(arr, 3) print(newarr[0]) print(newarr[1]) print(newarr[2]) print("================================") output Splitting NumPy Arrays into 3 Arrays ================================ [1 2] [3 4] [5 6]
E.Sorting Array using NumPy
arr = np.array(['banana', 'cherry', 'apple']) print("Splitting NumPy Arrays into 3 Arrays") print("================================") print(np.sort(arr)) print("================================") output Splitting NumPy Arrays into 3 Arrays ================================ ['apple' 'banana' 'cherry'] ================================
If you have started to play with data using NumPy ....
Certainly, needs more and more time … to understand the concepts, all are
extremely organized in this package. trust me!
Learn Scikit
Scikit The Learn library is one of the richest libraries in the Python family, contains a large number of machine learning algorithms and other key performance-related libraries. Python Scikit-learn allows users to perform various specific machine learning tasks. To work, should work together with the SciPy and NumPy libraries, this is something internal, anyway, Keep it in mind. Few algorithms here for your opinions.
- Regression
- Classification
- Grouping
- Model selection
- Dimensionality reduction
The syntax to import Scikit into your NoteBook
from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split
Python display packages
Matplotlib and Seaborn Libraries
Python provides 2D graphics functions with the Matplotlib library. this is very simple and easy to understand. you can achieve it with 1 O 2 lines. Even 3D visualization is there too.
The syntax for importing Scikit into your notebook
import matplotlib.pyplot as plt import seaborn as sns
Hope you have worked on various charts in Excel spreadsheet and other BI tools. But in python, internal visualization packages provide very high quality charts and tables.
Matplotlib y Seaborn
Matplotlib is one of the main and basic visualization packages, which provides histograms (Frequency level), Bar graph (Univariate and bivariate plotting), Scatter plot (Grouping), etc.,
Rich and deluxe data visualization library from Seaborn. Provides a high-level interface for drawing attractive and informative statistical charts. Box plots (Data distribution with different quartiles), Violin Plots (Data distribution Y Probability density), Bar charts (Comparisons between categorical characteristics), Heat map (Feature mapping in terms of matrix representation), Word cloud (Visual representation of text data)
Seaborn – Histogram
import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset('iris') sb.distplot(df['petal_length'],kde = False) plt.show()
Seaborn – Box plot
df = sb.load_dataset('iris') sb.boxplot(x = "species", y = "petal_length", data = df) plt.show()
Seaborn – Violinplot
sdf = sb.load_dataset('tips') sb.violinplot(x = "day", y = "total_bill", data=df) plt.show()
Then, All these libraries are helping us build a good model and play with the data!!
But always remember, before using the induvial packages, you need to understand the need and requirements of the package and then import it into your archive / pack and play with it.
Hope you now have the feeling and some level of detail about Python packages for data science. We will see more detailed concepts in the coming days!! Thanks for your time!
The media shown in this article is not the property of DataPeaker and is used at the author's discretion.