Big Data

Hypothesis testing | Parametric and nonparametric tests

This article was published as part of the Data Science Blogathon

Introduction

Hypothesis testing is one of the most important concepts in Statistics that is widely used by Statistics, Machine learning engineers, Y Data scientists.

In hypothesis testing, statistical tests are used to check whether the null hypothesisThe null hypothesis is a fundamental concept in statistics that establishes an initial statement about a population parameter. Its purpose is to be tested and, if refuted, allows us to accept the alternative hypothesis. This approach is essential in scientific research, as it provides a framework for evaluating empirical evidence and making data-driven decisions. Its formulation and analysis are crucial in statistical studies.... is rejected or not rejected. Are Statistical tests assume null hypothesis no relationship or no difference between groups.

Then, in this article, we will discuss the statistical test for hypothesis testing, including parametric and nonparametric tests.

1. What are parametric tests?

2. What are nonparametric tests?

3. Parametric tests for hypothesis tests

Test t
Z test
Test F
ANOVA

4. Nonparametric tests for hypothesis tests

Chi squared
Mann-Whitney U test
Kruskal-Wallis H test

Let us begin,

Parametric tests

The basic principle behind parametric testing is that we have a fixed set of parametersThe "parameters" are variables or criteria that are used to define, measure or evaluate a phenomenon or system. In various fields such as statistics, Computer Science and Scientific Research, Parameters are critical to establishing norms and standards that guide data analysis and interpretation. Their proper selection and handling are crucial to obtain accurate and relevant results in any study or project.... that are used to determine a probabilistic model that can also be used in Machine Learning.

Parametric tests are those tests for which we have prior knowledge of the population distribution (namely, normal), or if not, we can easily approximate it to a normal distribution, which is possible with the help of the Central Limit Theorem.

The parameters to use the normal distribution are:

Finally, the classification of a test as parametric depends entirely on the assumptions of the population. There are many parametric tests available, some of which are the following:

To find the confidence interval for the population means with the help of the known standard deviation.
Determine the confidence interval for the population means together with the unknown standard deviation.
Finding the confidence interval for the population variance.
Finding the Confidence Interval for the Difference of Two Means, with an unknown standard deviation value.

Nonparametric tests

In nonparametric tests, we do not make any assumptions about the parameters for the given population or the population we are studying. In fact, these tests are not population dependent.
Therefore, no fixed set of parameters available, and there is no distribution (normal distribution, etc.) of any kind available for use.

This is also why nonparametric tests are also called tests without distribution.
Nowadays, nonparametric tests are gaining popularity and an influencing impact, some of the reasons behind this fame is:

The main reason is that there is no need to be polite when using parametric tests.
The second reason is that we don't need to make assumptions about the given population (the take) on which we are doing the analysis.
Most of the non-parametric tests available are very easy to apply and understand as well, namely, complexity is very low.

Image source: Google images

Test T

1. It is a parametric test of hypothesis test based on Student's t distribution.

2. Essentially, it is about testing the significance of the difference of the mean values when the sample size is small (namely, less than 30) and when the population standard deviation is not available.

3. Assumptions of this test:

The population distribution is normal and
Samples are random and independent.
Sample size is small.
The population standard deviation is unknown.

4. The 'U test’ Mann-Whitney is a nonparametric counterpart of the T test.

A T test can be:

One-sample T-test: Compare a sample mean with the population mean.

where,

X is the sample mean

s is the standard deviation of the sample

North is the sample size

μ is the mean of the population

Two-sample T-test: Compare the means of two different samples.

where,

X₁ is the sample mean of the first group

X₂ is the sample mean of the second group

S₁ is the standard deviation of the sample 1

S₂ is the standard deviation of the sample 2

North is the sample size

Conclution:

If the value of the test statistic is greater than the value in the table -> Reject the null hypothesis.
If the value of the test statistic is less than the value in the table -> Don't reject the null hypothesis.

Z test

1. It is a parametric test of hypothesis test.

2. Used to determine if the means are different when the population variance is known and the sample size is large (namely, greater than 30).

3. Assumptions of this test:

The population distribution is normal
Samples are random and independent.
The sample size is large.
The standard deviation of the population is known.

A Z test can be:

One sample Z test: Compare a sample mean with the population mean.

Image source: Google images

Two-sample Z test: Compare the means of two different samples.

where,

X₁ is the sample mean of the first group

X₂ is the sample mean of the second group

σ₁ is the standard deviation of the population 1

σ₂ is the standard deviation of the population 2

North is the sample size

Test F

1. It is a parametric test of hypothesis test based on Snedecor F distribution.

2. It is a test for the null hypothesis that two normal populations have the same variance.

3. An F test is considered a comparison of the equality of the sample variances.

4. The F statistic is simply a relationship of two variances.

5. It is calculated as:

F = s₁²/s₂²

6. By changing the variance in the relationship, the F test has become a very flexible test. It can then be used to:

Test general significance for a regression model.
Compare the settings of different models and
Test for equality of means.

7. Assumptions of this test:

The population distribution is normal and
Samples are drawn randomly and independently.

ANOVA

1. Also called as Variation analysis, is a parametric test of hypothesis test.

2. It is an extension of the T test and the Z test.

3. Used to test the significance of differences in mean values between more than two sample groups.

4. Use the F test to statistically test the equality of means and the relative variance between them.

5. Assumptions of this test:

The population distribution is normal and
Samples are random and independent.
Homogeneity of the sample variance.

6. One-way ANOVA and two-way ANOVA are types.

7. F statistic = variance between the sample means / within-sample variance

Chi-square test

1. It is a non-parametric test of hypothesis testing.

2. As a nonparametric test, chi-square can be used:

goodness of fit test.
as a test of independence of two variables.

3. Helps to evaluate the goodness of fit between a set of theoretically observed and expected.

4. Makes a comparison between expected frequencies and observed frequencies.

5. The bigger the difference, the greater the chi-square value.

6. If there is no difference between the expected and observed frequencies, then the chi-square value is equal to zero.

7. It is also known as the “Goodness-of-fit test” which determines whether a particular distribution fits the observed data or not.

8. It is calculated as:

9. Chi-square is also used to test the independence of two variables.

10. Conditions for the chi-square test:

Collect and record random observations.
In the sample, all entities must be independent.
Neither group should contain very few items, let's say less than 10.
The reasonably large total number of items. Normally, should be at least 50, no matter how small the number of groups.

11. Chi-square as parametric test is used as test for population variance based on sample variance.

12. If we take each of a collection of sample variances, We divide them by the known population variance and multiply these ratios by (n-1), where n means the number of elements in the sample, we obtain the chi-square values.

13. It is calculated as:

Mann-Whitney U test

1. It is a non-parametric test of hypothesis testing.

2. This test is used to investigate whether two independent samples were selected from a population that has the same distribution..

3. It is a true nonparametric counterpart of the T-test and provides the most accurate estimates of significance., especially when sample sizes are small and the population does not have a normal distribution.

4. It is based on the comparison of each observation in the first sample with each observation in the other sample.

5. The test statistic used here is “U”.

6. The maximum value of “U” is' n₁*North₂'And the minimum value is zero.

7. It is also known as:

Mann-Whitney Wilcoxon test.
Mann-Whitney Wilcoxon range test.

8. Mathematically, U is given by:

U₁ = R₁ – n₁(North₁+1) / 2

where₁ is the sample size for the sample 1, y R₁ is the sum of ranks in the Sample 1.

U₂ = R₂ – n₂(North₂+1) / 2

When consulting the tables of significance, the smallest values of U₁ and you₂ They are used. The sum of two values is given by,

U₁ + U₂ = {R₁ – n₁(North₁+1) / 2} + {R₂ – n₂(North₂+1) / 2}

Knowing that R₁+ R₂ = N (N + 1) / 2 and N = n₁+ n₂, and doing some algebra, we find that the sum is:

U₁+ U₂ = n₁*North₂

Kruskal-Wallis H test

1. It is a non-parametric test of hypothesis testing.

2. This test is used to compare two or more independent samples of the same or different sample sizes.

3. Extends the Mann-Whitney U test, which is used to compare only two groups.

4. The one-way ANOVA is the parametric equivalent of this test. And that is why it is also known as ‘One-way ANOVA in ranges.

5. Use ranges instead of actual data.

6. It does not assume that the population is normally distributed.

7. The test statistic used here is “H”.

This completes today's discussion!!

Final notes

Thank you for reading!

Hope you enjoyed the article and increased your knowledge about statistical tests for hypothesis testing in statistics.

Please feel free to contact me about Email

Anything not mentioned or do you want to share your thoughts? Feel free to comment below and I'll get back to you.

For the remaining items, Ask the Link.

About the Author

Aashi Goyal

Nowadays, I am pursuing my Bachelor of Technology (B.Tech) in Electronic and Communication Engineering from Universidad Guru Jambheshwar (GJU), Hisar. I am very excited about the statistics, machine learning and deep learningDeep learning, A subdiscipline of artificial intelligence, relies on artificial neural networks to analyze and process large volumes of data. This technique allows machines to learn patterns and perform complex tasks, such as speech recognition and computer vision. Its ability to continuously improve as more data is provided to it makes it a key tool in various industries, from health....

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Hypothesis testing | Parametric and nonparametric tests

Contents

Introduction

Table of Contents

Parametric tests

Nonparametric tests

Test T

Z test

Test F

ANOVA

Chi-square test

Mann-Whitney U test

Kruskal-Wallis H test

Final notes

About the Author

Aashi Goyal

Related

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages

Hypothesis testing | Parametric and nonparametric tests

Contents

Introduction

Table of Contents

Parametric tests

Nonparametric tests

Test T

Z test

Test F

ANOVA

Chi-square test

Mann-Whitney U test

Kruskal-Wallis H test

Final notes

About the Author

Aashi Goyal

Related

Related Posts:

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages