Discrete probability distributions | Types of probability distributions

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp

Contents

This article was published as part of the Data Science Blogathon.

Introduction

Today, let's talk about one of the fundamental concepts of statistics: probability distributions. They help to better understand the data and act as a basis for understanding more statistical concepts, such as confidence intervals and hypothesis tests.

Informal definition

Let X be a random variable that has more than one possible outcome. Plot the probability on the y-axis and the result on the x-axis. If we repeat the experiment many times and graph the probability of each possible outcome, we obtain a graph that represents the probabilities. This graph is called the probability distribution (PD). The height of the graph for X gives the probability of that result.

24896binomial-2152455

Types of probability distributions

There are two types of distributions according to the type of data generated by the experiments.

1. Discrete probability distributions

These distributions model the probabilities of random variables that can have discrete values ​​as outcomes.. For instance, the possible values ​​for the random variable X representing the number of heads that can occur when a coin is tossed twice are the set {0, 1, 2} and not just any value of 0 a 2 What 0.1 O 1.6.

Examples: Bernoulli, Binomial, negative binomial, Hypergeometric, etc.,

2. Continuous probability distributions

These distributions model the probabilities of random variables that can have any possible outcome.. For instance, the possible values ​​for the random variable X that represents the weight of citizens in a city that can have any value such as 34,5, 47,7, etc.

Examples: Normal, T the Student, Chi squared, Exponential, etc.,

Terminologies

Each DP provides us with additional information about the behavior of the data involved.. Each PD is given by a probability function that generalizes the probabilities of the results.

With this, we can estimate the probability of a particular outcome (discreet) or the probability that it falls within a particular range of values ​​for any given outcome (continuous). The function is called the probability mass function (PMF) for discrete distributions and probability density function (PDF) for continuous distributions. The total value of PMF and PDF in the entire domain is always equal to one.

Cumulative distribution function

The PDF provides the probability of a particular outcome, while the Cumulative Distribution Function provides the probability of seeing a result less than or equal to a particular value of the random variable. CDFs are used to check how the probability has been added up to a certain point. For instance, and P (X = 5) is the probability that the number of heads when tossing a coin is 5, P (X <= 5) denotes the cumulative probability of obtaining from 1 a 5 faces.

Cumulative distribution functions are also used to calculate p-values ​​as part of hypothesis testing..

Discrete probability distributions

There are many discrete probability distributions to use in different scenarios. We will discuss discrete distributions in this post.. The binomial and Poisson distributions are the most discussed in the following list.

1. Bernoulli distribution

This distribution is generated when we perform an experiment once and it only has two possible outcomes: success and failure. Tests of this type are called Bernoulli tests., that form the basis of many distributions discussed below. Let p be the probability of success and 1 – p is the probability of failure.

The PMF is given as

22323bernoulli_pmf-1340999

An example of this would be flipping a coin once. p is the probability of getting ahead and 1 – p is the probability of obtaining a tail. Please note that success and failure are subjective and we define them based on context.

2. Binomial distribution

This is generated for random variables with only two possible outcomes. Let p be the probability that an event is a success, which implies that 1 – p is the probability that the event is a failure. Performing the experiment repeatedly and graphing the probability each time gives us the Binomial distribution.

The most common example given for the binomial distribution is tossing a coin n number of times and calculating the probabilities of getting a particular number of heads.. More real world examples include the number of successful sales calls for a company or whether a drug works for a disease or not..

The PMF is given as,

55002binomial_pmf-9219053

where p is the probability of success, n is the number of attempts and x is the number of times we get a success.

3. Hypergeometric distribution

Consider the case of removing a red marble from a box of different colored marbles.. The event of drawing a red ball is a success and not drawing it is a failure. But every time a marble is drawn, it is not returned to the box and, Thus, this affects the probability of getting a ball in the next round. The hypergeometric distribution models the probability of k successes in n trials where each trial is performed without replacement. This is different from the binomial distribution where the probability remains constant during the trials..

The PMF is given as,

85862hypergeometric-1646114

where k is the number of possible successes, x is the desired number of successes, N is the size of the population and n is the number of trials.

4. Negative binomial distribution

Sometimes we want to check how many Bernoulli tests we need to do to get a particular result. The desired result is specified in advance and we continue the experiment until it is achieved. Consider the example of rolling a die. Our desired result, defined as a success, is to get a 4. We want to know the probability of obtaining this result three times. This is interpreted as the number of failures (other numbers apart from 4) what will happen before we see the third success.

The PMF is given as,

77645nb_formula-6630828

where p is the probability of success, k is the number of observed failures and r is the desired number of successes until the experiment is stopped.

As in the binomial distribution, the probability across trials remains constant and each trial is independent of the other.

5. Geometric distribution

This is a special case of negative binomial distribution where the desired number of successes is 1. Measures the number of failures we get before a success. Using the same example given in the previous section, we would like to know the number of failures we see before we get the first ones 4 when throwing the dice.

42464geometric-4296509

where p is the probability of success and k is the number of failures. Here, r = 1.

6. Poison distribution

This distribution describes the events that occur in a fixed interval of time or space.. An example could clarify this. Consider the case of the number of calls a customer service center receives per hour. We can estimate the average number of calls per hour, but we cannot determine the exact number and the exact time that there is a call. Each occurrence of an event is independent of the other occurrences.

The PMF is given as,

89911poisson_formula-8714956

where λ is the average number of times the event has occurred in a certain period of time, x is the desired result and e is the Euler number.

7. Multinomial distribution

In previous distributions, there are only two possible outcomes: success and failure. But nevertheless, multinomial distribution describes random variables with many possible outcomes. Sometimes, this is also called a categorical distribution, since each possible outcome is treated as a separate category. Consider the scenario of playing a game n times. The multinomial distribution helps us determine the combined probability that the player 1 win x1 times, the player 2 will win x2 times and player k wins Xk times.

The PMF is given as,

45389multinomial_formula-4394383

where n is the number of trials, p1,…… pagk denote the probabilities of outcomes x1……Xk respectively.

In this post, We have defined probability distributions and briefly discussed different discrete probability distributions. Let me know your thoughts on the article in the comment section below..

References

1. https://www.statisticshowto.com/

2. https://stattrek.com/

3. Wikipedia

About me

I am a former software engineer working on the transition to data science. I am a Master's student in Data Science. Feel free to connect with me at https://www.linkedin.com/in/priyanka-madiraju/

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.