Have passed 16 months since I wrote my blog post. 5 Common mistakes of DAX self-taught students at PowerPivotPro.com. I train many people in my training courses, including my Power BI online training course, and I also help many more in various forums. The Error number 1 what I see in self-taught DAX students with Excel experience is the overuse and overuse of calculated columns instead of measures.
Attracted like a magnet
As I mentioned in the previous post, Excel users are drawn to calculated columns like a magnet.
Perhaps a better analogy would be “Drawn to a flame like a moth” with sometimes equally dangerous side effects.
I think this is because Excel users are very comfortable working on the table structure that exists in the Power Pivot window. At the same time, self-taught Power Pivot users regularly don't know any better.
Take a look at the Power Pivot window below (Adventure Works Sales Chart). When an untrained DAX user who is used to working in Excel looks at this table (which looks a lot like a spreadsheet), it “natural” is to think “I have sales (1 then), I have got Cost (2), but I need Margin $ (Sales less Cost). Then, they jump and write a simple calculation column for the margin (3 then).
But this is 100% incorrect. Please, do not do it! Instead, use Measurements.
Reasons calculated columns can be a hindrance
I have been very critical of calculated columns in the past to the point where I have suggested they are evil (as implied in the chart below). This is not strictly true, but they can certainly be bad in the hands of the uninitiated, so i advise new users to stay away unless they are clear why it is the right approach.
Here is a list of reasons why calculated columns can be bad.
- The results of your formula are calculated and stored in the table for each row in the table. This takes up disk space and also uses memory space, and this will make your report less efficient. If your table has 100 rows, it probably doesn't matter. If your table has 100 million rows, then it can be a big problem.
- Compression on calculated columns may not be as good as compression on imported columns. This is because the calculated columns are compressed last. Therefore, Power Pivot doesn't give calculated columns the same prioritization consideration that it gives imported columns.
- Calculated columns are recalculated each time the workbook is updated with the data source. This will slow down the update procedure (more and more for each additional calculated column).
- You can unintentionally create circular reference conflicts in your tables (specifically data tables) for reasons that are very difficult for students to understand (and beyond the scope of this blog to explain).
Benefits of using measures instead of calculated columns
The most practical reason you shouldn't write Calculated Columns is that you just don't need to. Power Pivot was built for measurements. If the calculated columns are bad, then The measurements are angelic. Here are the benefits of using measures instead of calculated columns.
- Your workbooks will be smaller and faster. No this “materializing” the results of all possible calculations in your table and storing them in memory and on disk (as with a calculated column). Instead of measurements are calculated on the fly in memory on demand, and measurement calculations are generally very fast. The only results that materialize are the results that should be displayed within your pivot table.
- Pivot tables naturally filter your data model before calculations are complete. This means that Measures should only be calculated on the filtered subset of data, further reducing the number of calculations required.
Take the calculated column mentioned previously. You can get exactly the same result by typing the following measurements. Please note that these are measurements, no calculation columns.
Total Sales = SUM(Sales[SalesAmount]) Total Cost = SUM(Sales[TotalProductCost]) Total Margin = [Total Sales] - [Total Cost]
General rules of when calculated columns are correct
I still believe that calculated columns are bad in the wrong hands, despite this, as I have built deep knowledge, I have learned more about the specific conditions when it's okay to use calculated columns. These are my general rules of thumb on when it's okay to use calculated columns. For Excel users who are still learning; if you don't know which one to use, then I recommend that you assume that you must use a measure at least until you are sure why you should use a calculated column. Taking this into account, it is generally okay to use computed columns in the following circumstances.
- When you need to use the results to filter your data model. You cannot use a measure in rows, columns, filters or slicers in your pivot table. If you want / need to do this, then you need a column and not a measure. It is preferable to add the column to your data source instead of using a calculated column. The benefits of loading it from source include:
- which can reuse the source column in other workbooks without rewriting the calculated column
- there is an opportunity to improve compression on import
- updating your workbook will be faster.
But for the times when it is not feasible to import from the source, then a calculated column is your friend.
- When you are improving a lookup table (dimension). Lookup tables are regularly smaller (less rows) Y, therefore, the negative impacts of the calculated columns are not that important.
- When the results of the calculated column have a low cardinality. The uniqueness of values in a column is the enemy of compression in Power Pivot. Then, if your calculated column returns 2 possible unique values (as an example, Yes or no) and this column helps you write a complex measure, then there are no problems with the calculated column. Columns calculated with a low cardinality compress very well.
- When you have a very complicated formula that is very demanding on your report to the point where poor runtime performance of a measure calculation is so bad that it is preferable “precalculate” the result and store it in a column. A self-taught Excel user is unlikely to write such a complex formula in the early days of learning Power Pivot, but it is important to note that this is a valid use case anyway.
Where to learn more
- Subscribe to blogs and read everything you can. You can subscribe to my blog at the top left of this page.
- read a book. I book Supercarga Power BI covers calculated columns, as well as all the other topics you need to learn to be good at Power BI Power Pivot for Excel. My book focuses on giving you theory and practice. I keep a curated list of other great Power BI and Power Query books in my knowledge base.
- Do some live training. Students of my courses live in Australia walk away with deep knowledge that is difficult to replicate in other learning environments. If you really want to get a deep understanding, consider a live training course.
- If you prefer online training, you can sign up for me Supercharge Power BI Online Trainingg for guided learning.