Big Data

Hypothesis testing | Hypothesis Testing for Data Science Beginners

This article was published as part of the Data Science Blogathon.

Introduction

Before going to that, let's know what exactly Hypothesis means:

"The hypothesis is described as a recommended solution for an indefinable incident that does not fit into current theory".

The actual definition of hypothesis testing is whereby an analyst tests an assumption regarding a population parameter. The methodology retained by the analyst depends on the nature of the data used and the reason for the analysis..

Hypothesis testing

Steps to perform the hypothesis test:

Define null and alternative hypotheses
Browse data, check assumptions
Calculate test statistic
Determine the corresponding p-value
Make a decision about the null hypothesis.

To perform all these steps, let's take an example to easily understand.

Trouble: Taking into account the Italian adults of the age group of 18 a 30 years living in Italy, Do men have a body mass index (IMC) mean significantly higher than women?

Here the population is Italian adults (18-30) in Italy and the parameter of interest is the body mass index (IMC)

Paso 1: define hypothesis

Null: There is no difference in the mean BMI
H (0): U₁= U₂ [U1 represents the population mean BMI for Males and U₂ represents the population mean BMI for females]
Here H (0) says they are equal to each other
Alternative: there is a significant difference in the mean BMI
H (A): U1=U2 [U1 represents the population mean BMI for Males and U2 represents the population mean BMI for females]
Here H (A) says they are not equal to each other
Significance level = 5%

Paso 2: examine data and verify assumptions

In this step, the data was filtered to include only Italian adults who were between 18 Y 30 years. After that, we need to do some statistical calculations like the mean, the minimum, the maximum, the standard deviation and sample size for both males and females.

Some of the assumptions that we must verify are the following:

Samples are considered simple random samples
The samples are independent of each other
Both response populations are approximately normal or the sample sizes are large enough.

Paso 3: Calculate test statistic:

The test statistic is a measure of how far our sample statistic is from our hypothesized population parameter., in terms of estimated standard errors.

Z = Best estimate – null value / estimated standard error
The best estimate is the difference between the mean of the male and female statistical sample
The null value is the hypothetical null value
The estimated standard error for two means can change depending on the approach we are going to use..
The two approaches you can use are the clustered approach and the non-clustered approach..
The combined approach is that the variance of two populations is assumed to be equal.
The non-clustered approach consists of eliminating the assumption of equal variances.

Paso 4: Determination of the P value:

The p-value is determined assuming that the null hypothesis is true, is the probability of observing a test statistic of a value (WITH) or more extreme.

So we are going to calculate this probability using the Z distribution where dF = n₁+ n₂-2

we need to check both sides since it is a two-sided alternative hypothesis because our alternative is not the same as well. Thus, we have to check both the upper and lower tails of our distribution.

The distribution chart looks as shown below with its corresponding sample size and degrees of freedom:

Distribution curve

In the graph above, we can see both our positive test statistic value and below negative test statistic value. This means that if the difference in the mean BMI of the population between men and women was really zero, then if that null hypothesis were true, then it is quite probable to observe a difference in the sample means of the statistical value of the test or something more extreme. There is almost a 20 percent chance of seeing that because this value is so large, we will go ahead and not reject the null value.

Paso 5: take a decision

If the P-value is greater than the significance level, which means there is weak evidence against the null value. Therefore, we do not reject the null hypothesis.

Then, in summary, hypothesis tests are used to test theories about a parameter of interest. Here, that parameter is the difference in the population means. The basic steps to perform this hypothesis test. First, let's define our hypotheses. Later, we will examine our data while checking our assumptions and calculating our test statistic. With this test statistic, we will determine our corresponding p-value and, Finally, we will make a decision based on this value.

The assumptions for the two-sample t-test for the population means are that we need both data sets to be two simple random samples and to be independent of each other.. We need to ensure that both response populations are normally distributed. On the contrary, we need to make sure we have at least one large sample size so that we can apply the central limit theorem. If our population variations are the same or not, it is also crucial to determine if we use a clustered or non-clustered approach. Finally, we need to know how to interpret the p-value, the decision and our final conclusion. These are all very important when conducting a hypothesis test..

For more items, see this profile:

https://likhithakakanuru.medium.com/

Hypothesis testing | Hypothesis Testing for Data Science Beginners

Contents

Introduction

Steps to perform the hypothesis test:

Paso 1: define hypothesis

Paso 2: examine data and verify assumptions

Paso 3: Calculate test statistic:

Paso 4: Determination of the P value:

Paso 5: take a decision

Related

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages

Hypothesis testing | Hypothesis Testing for Data Science Beginners

Contents

Introduction

<img alt='untitled-design-4-8851911' loading="lazy" class="alignnone wp-image-73959 size-full" src='https://datapeaker.com/wp-content/uploads/2021/09/Untitled-design-4-8851911.png' width='1920' height='1080'>

Steps to perform the hypothesis test:

Paso 1: define hypothesis

Paso 2: examine data and verify assumptions

Paso 3: Calculate test statistic:

Paso 4: Determination of the P value:

Paso 5: take a decision

Related

Related Posts:

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages