Understand machine learning | What is machine learning?

Contents

This article was published as part of the Data Science Blogathon.

What is machine learning?

Machine learning: Machine learning (ML) is a highly iterative process and ML Models are learned from past experiences and also to analyze historical data. What's more, ML models can identify patterns to make predictions about the future of the given data set.

59049ml-as-7400131

WWhy is machine learning important?

Since 5V is dominating today's digital world (volume, variety, variance and value visibility), most industries are developing various models to analyze their presence and opportunities in the market, based on this result, they are delivering the best products. services to your clients on a large scale.

780945vs-5465725

What are the main machine learning applications?

Machine learning (ML) is widely applicable in many industries and the implementation and improvement of their processes. Nowadays, ML has been used in multiple fields and industries without limits. The figure below represents the area where ML plays a vital role.

11774ml-example-4007080

Where is machine learning in the AI ​​space?

Just take a look at the Venn diagram, we could understand where ML is in the AI ​​space and how it relates to other AI components.

How do we know the Jargons that fly around us, let's quickly see what exactly each component talks about.

14967ml-ai-6491059

How are data science and machine learning related?

49759dsvsml-3556395

Machine learning process, is the first step in the ML process to take data from multiple sources and followed by fine-tuned data processing, these data would be the source for ML algorithms based on the problem statement, like predictive models, rankings and others that are available in the ML world space. Let's discuss each process one by one here.

99744ml20process-2876383

Machine learning – Etapas: We can divide the stages of the AA process into 5 as mentioned below in the flowchart.

  1. Data set
  2. Data negotiation
  3. Construction of the model
  4. Model evaluation
  5. Model deployment

Identification of business problems, before moving on to the previous stages. Then, we must be clear about the objective of the purpose of the implementation of the ML. Find the solution to the given problem / identified. we must collect the data and properly monitor the next steps.

52606ml-stage-6440387

Data set

The collection of data from different sources can be internal and / or external to meet requirements / business problems. Data can be in any format. CSV, XML.JSON, etc., here Big Data plays a vital role in making sure the correct data is in the expected format and structure.

49138dss-1347276

Data negotiation and data processing: The main objective of this stage and approach are the following.

Data processing (EDA):

  1. Understand the given dataset and help clean up the given dataset.
  2. Gives you a better understanding of the characteristics and the relationships between them
  3. Extract essential variables and leave behind / remove non-essential variables.
  4. Handling missing values ​​or human error.
  5. Identification of outliers.
  6. The EDA process would maximize insights from a data set.

Function engineering:

  1. Handling missing values ​​in variables
  2. Convert categorical to numeric as most algorithms need numerical characteristics.
  3. Need to correct non-gaussian (normal). Linear models assume that the variables have a Gaussian distribution.
  4. Find outliers are present in the data, so we truncate the data above a threshold or transform the data by transforming records.
  5. Scale features. This is necessary to give the same importance to all the characteristics and not more to the one whose value is greater.
  6. Feature engineering is a costly and time-consuming process.
  7. Feature engineering can be a manual process, can be automated
15749eda-5935489

Training and testing:

  1. The training data is used to ensure that the machine recognizes patterns in the data, cross-validation of the data is used to ensure better accuracy and
    the efficiency of the algorithm used to train the machine.
  2. The test data is used to see how well the machine can predict new responses based on your training.
  3. The train test division procedure is used to estimate the ML performance of algorithms when they are used to make predictions on data that is not
    used to train the model.
14391tt-1378295

Training

  1. Training data is the data set that the model trains on.
  2. Train data from which the model has learned experiences.
  3. Training sets are used to adjust and adjust your models.

Tests

  1. The test data is the data that is used to verify if the model has
    learned well enough from the experiences you got on the train dataset.
  2. Test sets
    are data “invisible” to evaluate your models.

Train data: Train our machine learning algorithm
Test data: After training the model, test data is used to test your efficiency and model performance.

The purpose of the random state in the train test division: random state ensures that the divisions that you generate are reproducible. the random state that you provide is used as a seed for the random number generator. This ensures that the random the numbers are generated in the same order.

31320sampling-8586788

Data divided into training set / proof

  1. We used to divide a dataset into training data and test data in the machine learning space.
  2. The divided range is usually 20% al 80% between the testing and training stages of the given data set.
  3. A large amount of data would be spent to train your model
  4. The rest of the amount can be spent to evaluate your test model.
  5. But can't mix / reuse the same data for training and testing purposes
  6. If you evaluate your model with the same data that you used to train it, your model could be very over-tuned. Then the question arises whether the models can predict new data.
  7. Therefore, you should have separate test and training subsets of your dataset.

MODEL EVALUATION: Each model has its own model evaluation mythology, some of the best reviews are here.

  1. Evaluate regression Model.
    1. Sum of the squared error (SSE)
    2. Root mean square error (MSE)
    3. Root mean square error (RMSE)
    4. Mean absolute error (MUCH)
    5. Determination coefficient (R2)
    6. R2 adjusted
  2. Evaluate Classification Model.
    1. Confusion matrix.
    2. Accuracy score.
    3. AUC y ROC.

Deployment of a ML-model simply means integrating the finished model into a production environment and obtaining results to make business decisions.

31292ml-pd-8214249

Therefore, I hope you can understand the end-to-end process flow of machine learning and I think it would be helpful for you. Thanks for your time.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.