Statistics in Excel | 10 statistical functions in Microsoft Excel

Contents

Overview

  • Microsoft Excel is an excellent tool for learning and executing statistical functions
  • Here there is 12 Statistical functions in Excel that you must master for a successful analytics career

We are going to excel in statistics!

“Statistics is the grammar of science”. – Karl Pearson

Let's make it a little more relevant to us: statistics is the grammar of data science. You will notice that almost all successful data science or analytics professionals have a solid understanding of statistics., but does that mean you need a master's degree in the subject?

Absolutely not!

giphy-9522744

We will approach this article using the famous Pareto Principle: the 80% consequences come from 20% of the causes. Therefore, we will focus mainly on the 20% of the concepts and functions that are useful the 80% weather. Now, the question as an analyst is, How to start implementing statistics?

Of course, you can implement and practice these concepts in programming languages ​​like Python and R. But wait a second and think: What is the most used tool in most organizations?

It's Microsoft Excel! Honestly, Excel is the Swiss Army Knife for analytics professionals that helps you focus on what's important (statistics in our case) and handles the rest of the calculations and customizations by itself.

This article is for everyone who is starting their journey towards statistics using Excel.. we will use 10 Key Statistical Functions in Excel to Answer Questions for a Fictitious Sports Company, Khelo, while we examine your data.

If you are a beginner in the world of analytics and Excel, I highly recommend that you follow these free courses:

How we will approach these statistical functions 1st in Excel

I have divided these statistical functions in Excel into two categories:

  • Basic statistical functions
  • Intermediate statistical functions

Let's start!

Understanding the data and posing the problem

We will solve some key questions about our fictitious sports apparel and equipment company: Khelo. These are the columns we have:

  • Team
  • Number of items sold
  • Cost of each item
  • % Off
  • Income

pexels-jopwell-1325735-300x200-9491425

In the course of this article, we will answer these questions:

  1. How many items have discount?
  2. How many items / equipment sells the store?
  3. What is the number of products sold without discount?
  4. Are there products sold that cost more than 2000 together with a discount rate higher than 50%?
  5. What is the average number of products sold?
  6. What is the median number of products sold?
  7. What is the most frequent discount percentage?
  8. What is the standard deviation of the number of products sold?
  9. Is there a relationship between the number of products sold and the discount percentage?

Basic statistical functions in Excel

MS Excel provides a variety of useful statistical functions. Let's start with some of the basic but extremely powerful features. Honestly, You will find that you are using the basic statistical functions on 90% of time and 10% the remainder of your time is taken by intermediate and advanced functions.

We will mainly talk about the different types of counting functions here. They are very similar to other functions like sum, maximum, minimum, average.

1. Counting function

We use the count function when we need to count the number of cells that contain a number. Remind ONLY NUMBERS! Let's see the function:

  • COUNT (valor1, [value2],…)

Then, let's try to find the answer to our first question: How many items had a discount?

excel-count-8779758

There is 11 discounted products.

2. Counta function

While the counting function only counts numerical values, the COUNT function counts all cells in a range that are not empty. The function is useful for counting cells that contain any kind of information, including error values ​​and empty text.

  • TELL (valor1, [value2],…)
We will answer the second question using the countta function, since it can count all the values ​​that are not empty: How many items / equipment sells the store?
excel-counta-3754656
The total number of items sold by the store is 13.

3. Countblank

the COUNTBLANK The function counts the number of empty cells in a cell range. Cells with formulas that return empty text are also counted here, but cells with zero values ​​are not counted. This is a great function to summarize empty cells while analyzing the data.

Summarizing the empty cells is the requirement for our third question: What products are not in the discount section? Let's apply the function!
excel-countblamk-6550682

There is only 2 non-discounted items.

4. Countifs function

Countifs is one of the most used statistical functions in Excel. The COUNTIF function applies one or more conditions to the cells in the given range and returns only those cells that meet all the conditions.

  • COUNTIFS (criteria_range1, criterios1, [criteria_range2, criteria2]…)
Note: Each new range must have the same number of rows and columns as the criteria_range1 argument. Ranges do not have to be adjacent to each other.
This function seems perfect to answer the fourth question: Are there products sold that cost more than 2000 together with a discount rate higher than 50%?
excel-countifs-8271563
The questions seemed complex, but it was very easy to find the answer in Excel. Solo 1 product, namely, shoes, it costs more than 2000 and is sold at a discount rate higher than 20%.
Marvelous, ¿no? We have analyzed some basic statistical functions in MS Excel so far. Then, Let's take a look at the intermediate statistical functions.

Intermediate statistical functions in Excel

Here we will discuss some of the intermediate statistical functions in MS Excel related to central tendency and dispersion. These functions are very useful in our day to day as an analyst.

5. Average function

The most common function that we habitually use in our daily life is the average (or average). The AVERAGE function simply returns the arithmetic mean of all cells in a given range:

  • AVERAGE (number 1, [number2],…)
But there is a simple downside to using averages: are prone to outliers. Therefore, can paint a very unrealistic picture in our analysis. Let's find out the average amount of goods sold: scattered
excel-avg-6923148
The average turns out to be ~ 365,2. We will also do similar calculations for the cost.
average-7471306

6. Median function

The outlier problem can be solved using another function for central tendency: the median. The median function returns the mean value of the given cell range. The syntax is quite simple:

  • MEDIAN (number 1, [number2],…)
Let's find the median number of products sold in our sports store and see how close it is to our average value:
excel-median-7966583
We see that the median turns out to be ~ 320, which is quite close to the average value. It means that there is not much fluctuation in our data. Let's see if this is the case for cost of goods:
median-3867810
The median and average value of the cost of each item vary widely. For instance, the cost of a ball is 50 but the cost of a bat is 2000, resulting in high dispersion.

7. Mode function

For numerical values, the mean and median are usually sufficient, but what about categorical values? Here, the mode enters the scene. The mode returns the most frequent and repeated value in the given range of values:

  • MODE.SNGL (number 1,[number2],…)
Note: MODE.SNGL returns only one value, while MODE.MULT returns an array of the most common values.

Good, this is a simple one. Let's find the most frequent discount value given by the sports store:

excel-mode-5221583
This discount value is 10%.

8. Standard deviation function

The standard deviation is one of the ways to quantify the dispersion. It is a measure of the dispersion of the values ​​of the mean value.

Here, we will use the STDEV.P function which is used to calculate the standard deviation based on the entire population given as arguments:

  • STDEV.P (number 1,[number2],…)
Note: The STDEV.P function assumes that its arguments are the entire population. If that is not the case, you can use the STDEV.S function ().
For a large sample size, the standard deviation of the population and samples will return approximately similar values. Previously, we have calculated the mean and median to get an idea of ​​the central tendency. Let's find out the standard deviation to see the level of dispersion:
excel-stdev-5719843stdev-5699933
As expected, the standard deviation of the quantity sold is less, which means that the dispersion is smaller while the standard deviation of the cost of the products is high.

9. Quartile functions

This is another function with abundant applications in the industry.. Helps us divide the population into groups. QUARTILES.INC returns the quartile of a data set, based on percentile values ​​of 0 a 1, inclusive.

For instance, you can use this function to know the 25% top of your customer base.

  • CUARTIL.INC (headquarters, room)

10. Correlation function

The CORREL function () it's my personal favorite. Provides really powerful information that is not obvious to the naked eye. the CORREL The function returns the correlation coefficient of two cell ranges. But what is that? Basically, tells us how strong the relationship is between the two variables.

Note: Does not show any cause and effect relationship.

The correlation value range is between -1 Y 1.

Let's go to our last and most interesting question: Is there a relationship between the quantity of goods sold and the discount percentage?

excel-correl-3765448

Good, the correlation turns out to be ~ 0.8, which is quite high. These seem to be positively related, which means more discount, more quantity sold.

Final notes

In this article, we analyze more than 10 statistical functions for beginners and intermediates in MS Excel, ranging from simple count () up to advanced run (). Statistics is one of the most important tools in an analyst's kit and you can achieve many of your statistical goals simply by using Excel..

I recommend that you check out the following additional resources in Excel:

I will cover advanced statistical functions in the future.. Let me know some of your favorite statistical functions and I will try to incorporate them in my next articles..

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.