Glossary of Common Machine Learning Terms, statistics and data science

Contents

Word

Description

Machine learning Machine learning refers to the techniques involved in handling big data in the smartest way (through the development of algorithms) for actionable insights. In these techniques, we expect algorithms to learn by themselves without being explicitly programmed. Mahout Mahout is an open source Apache project used to create scalable machine learning algorithms. Implement popular machine learning techniques as a recommendation, classification and clustering.

Mahout Features:

  • Mahout offers a framework to perform data mining tasks on large volumes of data
  • Mahout enables applications to analyze large data sets efficiently and quickly
  • It also offers distributed fitness function capabilities for evolutionary programming..
  • Includes multiple MapReduce-enabled cluster implementations, como k-means, fuzzy k-means, Dirichlet y Mean-Shift
Small map Hadoop MapReduce is a software framework for easily writing applications that process large amounts of data (multi-terabyte data sets) in parallel in large groups (hundreds of nodes) of basic hardware in a reliable and fault-tolerant manner.

A MapReduce framework is generally made up of three operations:

  1. Map: each worker node applies the map function to local data and writes the output to temporary storage. A master node ensures that only one copy of the redundant input data is processed.
  2. Shuffle: Worker nodes redistribute data based on output keys (produced by the map function), so that all data belonging to a key is in the same worker node.
  3. Reduce: Worker nodes now process each group of output data, by key, in parallel.

To learn more about MapReduce, visit here.

Market basket analysis Market basket analysis (also called MBA) is a widely used technique among marketers to identify the best feasible combination of products or services that customers often buy. This is also called product association analysis.. The association analysis is performed mainly on the basis of an algorithm called “A priori algorithm”. The result of this analysis is called association rules. Marketers use these rules to strategize their recommendations..

When two or more products are purchased, the analysis of the shopping cart is carried out to check if the purchase of a product increases the probability of buying other products. This knowledge is a tool for marketers to group products or design a strategy to cross-sell products to a customer..

Market mix modeling Market Mix Modeling is an analytical approach that uses historical information as the point of sale to quantify the impact of some of the components on sales.

Assume that the total sale is 100 $, this total can be divided into subcomponents, In other words, 60 $ base sale, 20 $ price, 18 $ can be distribution and 2 $ may be due to promotional activities. These numbers can be achieved using various logical methods. Each method can lead to a different break. Therefore, it is very important to standardize the procedure for the breakdown of total sales in these components. This formal technique is formally known as MMM or Market Mix Modeling.

Maximum likelihood estimate It is a method to find the values ​​of the parameters that make the probability maximum. The resulting values ​​are called maximum likelihood estimates. (MLE). To mean For a data set, the mean is said to be the average value of all numbers. Can sometimes be used as a representation of all data.

As an example, if you have the grades of students in a class and asked how well the class is performing. It would be irrelevant to say the grades of each student, However, can you find the mean of the class, who will be a representative of the class performance.
To find the mean, add all the numbers and then divide by the number of items in the set.

As an example, if the numbers are 1, 2, 3, 4, 5, 6, 7, 8, 8, then the mean would be 44/9 = 4,89.

Median The median of a set of numbers is usually the mean value. When the total numbers in the set are even, the median will be the average of the two mean values. The median is used to measure the central tendency.

To find the median of a set of numbers, follow the steps below:

  1. Arrange the numbers in ascending or descending order
  2. Find the mean value, which will be n / 2 (where n are the numbers of the set)
TIMES A management information system (WHAT) is a computer system consisting of hardware and software that serves as the backbone of an organization's operations. One MIS collects data from multiple online systems, analyzes information and reports data to aid in management decision making.

Objectives of MIS:

  • To drive decision making, providing accurate and up-to-date data on a range of organizational assets.
  • Correlate multiple data points to design strategies to drive operations.
ML-as-a-Service (MLaaS) Machine learning as a service (MLaaS) is a series of services that provide machine learning tools as part of cloud computing services. This may include tools for data visualization, facial accreditation, natural language processing, image accreditation, predictive analytics and deep learning. Some of the top ML-as-a-Service providers are:

  • Microsoft Azure Machine Learning Study
  • AWS Machine Learning
  • IBM Watson Machine Learning
  • Google Cloud Machine Learning Engine
  • BigML
Way The mode is the value that occurs most frequently in the population. It is a metric to measure the central tendency, In other words, a way to express, in a number (generally) unique, important information about a random variable or population.

The mode can be calculated through the following steps:

  • Count the number of times each value appears
  • Take the value that appears the most

Let's understand with an example:

Suppose we have a data set that has 10 data points, listed below:

4,5,2,8,4,7,6,4,6,3

So now we will calculate the number of times each value has appeared.

Value Tell
2 1
3 1
4 3
5 1
6 2
7 1
8 1

So we see that the value 4 is the one that is repeated the most, In other words, 3 times. Then, the mode of this data set will be 4.

Model selection Model selection is the task of choosing a statistical model from a set of known models. Several methods that can be used to select the model are:

  • Exploratory data analysis
  • Scientific methods

Some of the criteria for choosing the model may be:

  • Akaike information criteria (AIC)
  • R adjusted2
  • Bayesian information criterion (BIC)
  • Likelihood ratio test
Monte Carlo simulation The idea behind Monte Carlo Simulation is to use random samples of parameters or inputs to explore the behavior of a complex procedure. Monte Carlo simulations sample a probability distribution for each variable to produce hundreds or hundreds of possible outcomes. Results are analyzed to obtain probabilities of different results occurring. Multiple class classification Problems that have more than one class in the target variable are called multiple class classification problems..

As an example, if the goal is to predict the quality of a product, which can be excellent, good, average, regular, little. For this case, the variable has 5 lessons, so it is an obstacle of classification of 5 lessons.

Analisis multivariable Multivariate analysis is a procedure of comparing and analyzing the dependence of multiple variables on each other..

As an example, we can do a bivariate analysis of the combination of two continuous characteristics and find a link between them.

Multivariate regression Multivariate, as the word suggests, refers to 'multiple dependent variables'. A regression model designed to deal with multiple dependent variables is called a multivariate regression model..

Consider the example: for a given set of details about a student's interests, previous score by subject, etc., want to predict GPA for all semesters (GPA1, GPA2,….). This statement of the problem can be addressed through multivariate regression, since we have more than one dependent variable.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.