Hypothesis testing | Parametric and nonparametric tests

Contents

This article was published as part of the Data Science Blogathon

Introduction

Hypothesis testing is one of the most important concepts in Statistics that is widely used by Statistics, Machine learning engineers, Y Data scientists.

In hypothesis testing, statistical tests are used to check whether the null hypothesis is rejected or not rejected. Are Statistical tests assume null hypothesis no relationship or no difference between groups.

Then, in this article, we will discuss the statistical test for hypothesis testing, including parametric and nonparametric tests.

Table of Contents

1. What are parametric tests?

2. What are nonparametric tests?

3. Parametric tests for hypothesis tests

  • Test t
  • Z test
  • Test F
  • ANOVA

4. Nonparametric tests for hypothesis tests

  • Chi squared
  • Mann-Whitney U test
  • Kruskal-Wallis H test

Let us begin,

Parametric tests

The basic principle behind parametric testing is that we have a fixed set of parameters that are used to determine a probabilistic model that can also be used in Machine Learning..

Parametric tests are those tests for which we have prior knowledge of the population distribution (namely, normal), or if not, we can easily approximate it to a normal distribution, which is possible with the help of the Central Limit Theorem.

The parameters to use the normal distribution are:

Finally, the classification of a test as parametric depends entirely on the assumptions of the population. There are many parametric tests available, some of which are the following:

  • To find the confidence interval for the population means with the help of the known standard deviation.
  • Determine the confidence interval for the population means together with the unknown standard deviation.
  • Finding the confidence interval for the population variance.
  • Finding the Confidence Interval for the Difference of Two Means, with an unknown standard deviation value.

Nonparametric tests

In nonparametric tests, we do not make any assumptions about the parameters for the given population or the population we are studying. In fact, these tests are not population dependent.
Therefore, no fixed set of parameters available, and there is no distribution (normal distribution, etc.) of any kind available for use.

This is also why nonparametric tests are also called tests without distribution.
Nowadays, nonparametric tests are gaining popularity and an influencing impact, some of the reasons behind this fame is:

  • The main reason is that there is no need to be polite when using parametric tests.
  • The second reason is that we don't need to make assumptions about the given population (the take) on which we are doing the analysis.
  • Most of the non-parametric tests available are very easy to apply and understand as well, namely, complexity is very low.

49317hp-4112584

Image source: Google images

Test T

1. It is a parametric test of hypothesis test based on Student's t distribution.

2. Essentially, it is about testing the significance of the difference of the mean values ​​when the sample size is small (namely, less than 30) and when the population standard deviation is not available.

3. Assumptions of this test:

  • The population distribution is normal and
  • Samples are random and independent.
  • Sample size is small.
  • The population standard deviation is unknown.

4. The 'U test’ Mann-Whitney is a nonparametric counterpart of the T test.

A T test can be:

One-sample T-test: Compare a sample mean with the population mean.

Introduction to statistics for uncertainty analysis |  isopresupuestos |  hypothesis testing

where,

X is the sample mean

s is the standard deviation of the sample

North is the sample size

μ is the mean of the population

Two-sample T-test: Compare the means of two different samples.

t-test-formula-4514973

where,

X1 is the sample mean of the first group

X2 is the sample mean of the second group

S1 is the standard deviation of the sample 1

S2 is the standard deviation of the sample 2

North is the sample size

Conclution:

  • If the value of the test statistic is greater than the value in the table -> Reject the null hypothesis.
  • If the value of the test statistic is less than the value in the table -> Don't reject the null hypothesis.

Z test

1. It is a parametric test of hypothesis test.

2. Used to determine if the means are different when the population variance is known and the sample size is large (namely, greater than 30).

3. Assumptions of this test:

  • The population distribution is normal
  • Samples are random and independent.
  • The sample size is large.
  • The standard deviation of the population is known.

A Z test can be:

One sample Z test: Compare a sample mean with the population mean.

1b7izyqyp8sj-w51x_l5ekg-3284864

Image source: Google images

Two-sample Z test: Compare the means of two different samples.

220sample20z20score-8196051

where,

X1 is the sample mean of the first group

X2 is the sample mean of the second group

σ1 is the standard deviation of the population 1

σ2 is the standard deviation of the population 2

North is the sample size

Test F

1. It is a parametric test of hypothesis test based on Snedecor F distribution.

2. It is a test for the null hypothesis that two normal populations have the same variance.

3. An F test is considered a comparison of the equality of the sample variances.

4. The F statistic is simply a relationship of two variances.

5. It is calculated as:

F = s12/s22

Data Analysis in Geosciences Hypothesis Tests

6. By changing the variance in the relationship, the F test has become a very flexible test. It can then be used to:

  • Test general significance for a regression model.
  • Compare the settings of different models and
  • Test for equality of means.

7. Assumptions of this test:

  • The population distribution is normal and
  • Samples are drawn randomly and independently.

ANOVA

1. Also called as Variation analysis, is a parametric test of hypothesis test.

2. It is an extension of the T test and the Z test.

3. Used to test the significance of differences in mean values ​​between more than two sample groups.

4. Use the F test to statistically test the equality of means and the relative variance between them.

5. Assumptions of this test:

  • The population distribution is normal and
  • Samples are random and independent.
  • Homogeneity of the sample variance.

6. One-way ANOVA and two-way ANOVA are types.

7. F statistic = variance between the sample means / within-sample variance

Chi-square test

1. It is a non-parametric test of hypothesis testing.

2. As a nonparametric test, chi-square can be used:

  • goodness of fit test.
  • as a test of independence of two variables.

3. Helps to evaluate the goodness of fit between a set of theoretically observed and expected.

4. Makes a comparison between expected frequencies and observed frequencies.

5. The bigger the difference, the greater the chi-square value.

6. If there is no difference between the expected and observed frequencies, then the chi-square value is equal to zero.

7. It is also known as the “Goodness-of-fit test” which determines whether a particular distribution fits the observed data or not.

8. It is calculated as:

chisqu-8700372

9. Chi-square is also used to test the independence of two variables.

10. Conditions for the chi-square test:

  • Collect and record random observations.
  • In the sample, all entities must be independent.
  • Neither group should contain very few items, let's say less than 10.
  • The reasonably large total number of items. Normally, should be at least 50, no matter how small the number of groups.

11. Chi-square as parametric test is used as test for population variance based on sample variance.

12. If we take each of a collection of sample variances, We divide them by the known population variance and multiply these ratios by (n-1), where n means the number of elements in the sample, we obtain the chi-square values.

13. It is calculated as:

chi-square-test-1232384

Mann-Whitney U test

1. It is a non-parametric test of hypothesis testing.

2. This test is used to investigate whether two independent samples were selected from a population that has the same distribution..

3. It is a true nonparametric counterpart of the T-test and provides the most accurate estimates of significance., especially when sample sizes are small and the population does not have a normal distribution.

4. It is based on the comparison of each observation in the first sample with each observation in the other sample.

5. The test statistic used here is “U”.

6. The maximum value of “U” is' ​​n1*North2'And the minimum value is zero.

7. It is also known as:

  • Mann-Whitney Wilcoxon test.
  • Mann-Whitney Wilcoxon range test.

8. Mathematically, U is given by:

U1 = R1 – n1(North1+1) / 2

where1 is the sample size for the sample 1, y R1 is the sum of ranks in the Sample 1.

U2 = R2 – n2(North2+1) / 2

When consulting the tables of significance, the smallest values ​​of U1 and you2 They are used. The sum of two values ​​is given by,

U1 + U2 = {R1 – n1(North1+1) / 2} + {R2 – n2(North2+1) / 2}

Knowing that R1+ R2 = N (N + 1) / 2 and N = n1+ n2, and doing some algebra, we find that the sum is:

U1 + U2 = n1*North2

Kruskal-Wallis H test

1. It is a non-parametric test of hypothesis testing.

2. This test is used to compare two or more independent samples of the same or different sample sizes.

3. Extends the Mann-Whitney U test, which is used to compare only two groups.

4. The one-way ANOVA is the parametric equivalent of this test. And that is why it is also known as ‘One-way ANOVA in ranges.

5. Use ranges instead of actual data.

6. It does not assume that the population is normally distributed.

7. The test statistic used here is “H”.

This completes today's discussion!!

Final notes

Thank you for reading!

Hope you enjoyed the article and increased your knowledge about statistical tests for hypothesis testing in statistics.

Please feel free to contact me about Email

Anything not mentioned or do you want to share your thoughts? Feel free to comment below and I'll get back to you.

For the remaining items, Ask the Link.

About the Author

Aashi Goyal

Nowadays, I am pursuing my Bachelor of Technology (B.Tech) in Electronic and Communication Engineering from Universidad Guru Jambheshwar (GJU), Hisar. I am very excited about the statistics, machine learning and deep learning.

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.