This article was published as part of the Data Science Blogathon.
What is machine learning?
Machine learning: Machine learning (ML) is a highly iterative process and ML Models are learned from past experiences and also to analyze historical data. What's more, ML models can identify patterns to make predictions about the future of the given data set.
WWhy is machine learning important?
Since 5V is dominating today's digital world (volume, variety, variance and value visibility), most industries are developing various models to analyze their presence and opportunities in the market, based on this result, they are delivering the best products. services to your clients on a large scale.
What are the main machine learning applications?
Machine learning (ML) is widely applicable in many industries and the implementation and improvement of their processes. Nowadays, ML has been used in multiple fields and industries without limits. The figure below represents the area where ML plays a vital role.
Where is machine learning in the AI space?
Just take a look at the Venn diagram, we could understand where ML is in the AI space and how it relates to other AI components.
How do we know the Jargons that fly around us, let's quickly see what exactly each component talks about.
How are data science and machine learning related?
Machine learning process, is the first step in the ML process to take data from multiple sources and followed by fine-tuned data processing, these data would be the source for ML algorithms based on the problem statement, like predictive models, rankings and others that are available in the ML world space. Let's discuss each process one by one here.
Machine learning – Etapas: We can divide the stages of the AA process into 5 as mentioned below in the flowchart.
- Data set
- Data negotiation
- Construction of the model
- Model evaluation
- Model deployment
Identification of business problems, before moving on to the previous stages. Then, we must be clear about the objective of the purpose of the implementation of the ML. Find the solution to the given problem / identified. we must collect the data and properly monitor the next steps.
Data set
The collection of data from different sources can be internal and / or external to meet requirements / business problems. Data can be in any format. CSV, XML.JSON, etc., here Big Data plays a vital role in making sure the correct data is in the expected format and structure.
Data negotiation and data processing: The main objective of this stage and approach are the following.
Data processing (EDA):
- Understand the given dataset and help clean up the given dataset.
- Gives you a better understanding of the characteristics and the relationships between them
- Extract essential variables and leave behind / remove non-essential variables.
- Handling missing values or human error.
- Identification of outliers.
- The EDA process would maximize insights from a data set.
Function engineering:
- Handling missing values in variables
- Convert categorical to numeric as most algorithms need numerical characteristics.
- Need to correct non-gaussian (normal). Linear models assume that the variables have a Gaussian distribution.
- Find outliers are present in the data, so we truncate the data above a threshold or transform the data by transforming records.
- Scale features. This is necessary to give the same importance to all the characteristics and not more to the one whose value is greater.
- Feature engineering is a costly and time-consuming process.
- Feature engineering can be a manual process, can be automated
Training and testing:
- The training data is used to ensure that the machine recognizes patterns in the data, cross-validation of the data is used to ensure better accuracy and
the efficiency of the algorithm used to train the machine. - The test data is used to see how well the machine can predict new responses based on your training.
- The train test division procedure is used to estimate the ML performance of algorithms when they are used to make predictions on data that is not
used to train the model.
Training
- Training data is the data set that the model trains on.
- Train data from which the model has learned experiences.
- Training sets are used to adjust and adjust your models.
Tests
- The test data is the data that is used to verify if the model has
learned well enough from the experiences you got on the train dataset. - Test sets
are data “invisible” to evaluate your models.
Train data: Train our machine learning algorithm
Test data: After training the model, test data is used to test your efficiency and model performance.
The purpose of the random state in the train test division: random state ensures that the divisions that you generate are reproducible. the random state that you provide is used as a seed for the random number generator. This ensures that the random the numbers are generated in the same order.
Data divided into training set / proof
- We used to divide a dataset into training data and test data in the machine learning space.
- The divided range is usually 20% al 80% between the testing and training stages of the given data set.
- A large amount of data would be spent to train your model
- The rest of the amount can be spent to evaluate your test model.
- But can't mix / reuse the same data for training and testing purposes
- If you evaluate your model with the same data that you used to train it, your model could be very over-tuned. Then the question arises whether the models can predict new data.
- Therefore, you should have separate test and training subsets of your dataset.
MODEL EVALUATION: Each model has its own model evaluation mythology, some of the best reviews are here.
- Evaluate regression Model.
- Sum of the squared error (SSE)
- Root mean square error (MSE)
- Root mean square error (RMSE)
- Mean absolute error (MUCH)
- Determination coefficient (R2)
- R2 adjusted
- Evaluate Classification Model.
- Confusion matrix.
- Accuracy score.
- AUC y ROC.
Deployment of a ML-model simply means integrating the finished model into a production environment and obtaining results to make business decisions.