Level: advanced
I was helping someone on Power Pivot Forum last week with a tricky calculated column question. What's more, you may know that I always advise beginners to stay away from calculated columns. Most of the time you don't need calculated columns because in most circumstances a measure is a better choice to use (read more about that here). This time, but nevertheless, the required formula is likely to be complex, the runtime performance of such a formula as a measure might be slow, in addition to the fact that the number of possible unique values of the DAX Formula is small. When these conditions are met, a calculated column can be a good solution.
My process is more important than my solution
I mean before I start that this blog post is about the process I went through a working solution. I think the process is much more interesting and much more useful than the final formula. What's more, when you look at the final formula, you might think that you don't even understand what the formula does, much less know how to write it. But here is the turning point: when I look at the final formula, I think the same. That is why the process of writing this formula is so important.
Here is the final formula.
If you want to learn how to write a formula like this, keep reading.
= CALCULATE( MAX(Changes[new_value]), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type"), FILTER( Changes, Changes[start_date] = CALCULATE( MAX(Changes[start_date]), FILTER( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIEST(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") ), Changes[start_date] < EARLIEST(Changes[start_date]) ) ) ) )
The problem explained
There is a data table (see example below). The table contains information about changes in the membership status of members of a club. Anytime, a member has
- A state (active or inactive)
- One type (full, partial, limited)
As he “condition” As the “type” are indicated by a code. Status and type codes may change over time. When these change, a change log is created showing the old_value and the new_value. One row in the table equals one change.
The key points are:
- The change log (line) it may be for a change in the condition or a change in the writes as indicated in the Change_Type column.
- There is a Member_ID column that shows which member the change row belongs to.
- The date of the change is the start_date.
Target
The goal is to know what the “Actual state” of the type code for each member. at the time the new changelog was added. In other words, at the time this changelog was created, Was this customer a full member, partial or limited?
Therefore, the problem is i need to find the last one “type code” set before the current transaction date. An example will make it clear.
Example
Here is a filtered copy of the table for member_id 3100. Transactions are sorted in the order start_date (what is the date of the transaction). This table contains my final working calculated column formula “Current rate”
- The first 2 transactions (rows) happened on 28/1/1998 (start date). At this time there was no pre-existing entry for “Guy” Y, as such, the calculated column “Current rate” go blank (watch 1 up).
- The transaction in the queue 3 happened on 22/3/2013 (start date). At the moment, the “Guy” for this client it was “12” because this was the last code set as new_value before the 22/3/2013. In this case, the “Guy” the 28/1/1998 as it is shown in 2 up. As a result, my calculated column returns 12 for the record 3 (watch 2 up).
- Register 3 actually changed the "Type" value from 12 a 14, so the following records (4 a 7) should return the value 14. And they do what you can see in 3 up.
A different design?
Before sharing with you how I created this calculated column, it's worth noting that this would probably be easier to solve if there were 2 separate tables, one containing the transactions of “Guy” and another table containing the transactions of “Condition”. Putting that aside, this is a big challenge as it is, and working on challenging DAX problems will help you get better and stronger: a real DAX Ninja.
How I solved the problem: Step by Step
The beginning
I had a clear vision in my mind of what I had to do before starting. When solving any DAX problem, must think about “filter first, evaluate second”. In other words, filter the table they give me so that it only contains what I need, then take the values I am looking for.
With this in mind, for each record, needed:
- Check which member the single record belonged to
- Check the date of this record change
- Filter the entire table to:
- this member
- also for all records that were change_type = “type”
- also for all records prior to the current record date
- Later, once i had the above filtered table, I had to find the last record by start_date and then take the number from the new_value column
Paso 1. Start with anything
Frankly, I just wanted to write something to help me get started. So I started with something simple. I asked myself: “How many rows does each customer have in this table?”
I wrote this calculated column.
= CALCULATE( COUNTROWS(Changes), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])) )
The line 4 It's the key. The line 4 dice…
“Go and put a filter on the Changes table where the member_id equals member_id of this row”. The PREVIOUS function is necessary because a calculated column has a row context and so does the FILTER function.
The VAR syntax
If you have Excel 2016 o Power BI Desktop, you can use VAR syntax. I think the VAR syntax is easier to write and read. Here is the same formula with VAR syntax.
= VAR ThisMember = Changes[member_id] RETURN CALCULATE( COUNTROWS(Changes), FILTER(Changes, Changes[member_id] = ThisMember) )
Hope you agree that this VAR version is easier to read and understand. First, set a value for the variable on the line 2. Later, you can use this variable within your formulas without having to worry about inner and outer row context.
Well, I am up and running and I have solved the first problem. I wrote a formula that correctly detects the current member_id and filters the table for that member (as you can see below in WIP column).
Of course, I needed to use COUNTROWS since it is not possible to put a table in a column. Instead, I used COUNTROWS as a hack to be able to “watch” how big is the table for each member. I use COUNTROWS regularly when I am building complex formulas, since it is a good way to “watch” the virtual table that I am building. Another important point here is the test data. When I first wrote this test formula, actually had 7 rows in the sample table for each customer. This made it very difficult to check if the formula worked. Then, before doing anything else, I went back and changed my test data so that each member had a different number of rows. Then i could “watch” that the formula was working. Good test data is essential for efficient formula writing.
Paso 2. How many rows of “type” for this member?
Working on my high level mental plan, I decided to try filtering the table to count how many rows there were for each client where the record was change_type = "type". Here is my new formula.
= CALCULATE( COUNTROWS(Changes), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") )
As you can see, I have returned to the original syntax (sin VAR). I just added a new filter in change_type = “type”. It was easy to check if this formula worked by looking at the results.
Paso 3. Extract something more useful
So far I have counted how many rows are in these test tables. Then, now that I had the correct table to use as my filter, I decided to try to extract the last start date from this table. This was as easy as changing the line 3 as you can see below.
= CALCULATE( MAX(Changes[start_date]), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") )
And the new formula now correctly extracts the last start date from this table. Remember that the table contains all change_type = “type” for this member_id, so this start date is the date of the last transaction of “type” found.
Paso 4. Convert the 2 filters in a single table
The above formula has 2 filter functions. I really need it to be a single table to be able to use it. There are a few ways to do this, but i decided to convert it to a table as follows. Note that I am still using COUNTROWS so I can validate that it is still working.
= COUNTROWS( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes,Changes[change_type] = "Type") ) )
The rows 3 a 7 now they are one “table” that i can use as i go. Instead, I could have used a single FILTER function with multiple filters using &&.
Paso 5. Get the latest date before this transaction date
This table contains all kinds of transactions. But I do not want to everybody type of transactions, I search the last transaction that occurred before this current transaction. In other words, I need to filter this table to remove any type of transactions that occurred on or after the date of is transaction. I decided to go back to VAR syntax here to make writing this formula easier.
= VAR thisStartDate = Changes[start_date] RETURN CALCULATE( MAX(Changes[start_date]), FILTER( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") ), Changes[start_date] < thisStartDate ) )
Note that the rows 7-11 they are the table I produced in step 4. I wrapped this inside another FILTER to delete all records on or after the current record date, only saving logs that occurred before the current change log date. Therefore, the rows 6 a 13 they return a table containing all the records of ‘type’ what happened before this record. Then I use the CALCULATE statement to return the date of the last transaction type. As you will see in the resulting table below, the WIP column now returns blanks for any record that does not have a record of 'type’ for an earlier period in time.
Paso 6. Extract the registry value from “type” anterior
Now that I know the date of the record of 'type’ Right, I can proceed to extract the value of that record from column new_value. Here is my formula.
= VAR thisStartDate = Changes[start_date] VAR dateToUse = CALCULATE( MAX(Changes[start_date]), FILTER( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") ), Changes[start_date] < thisStartDate) ) RETURN CALCULATE( MAX(Changes[new_value]), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type"), Changes[start_date] = dateToUse )
Notice how I took the result of the formula in step 5 and turned it into another VAR (rows 3 a 14). Later, I wrote a new formula (rows 16 a 21 up) to extract the value I need after filtering the member_id (line 18), el change_type (line 19) and the correct date of the transaction (line 20).
One last complication
I had one last problem that I ran into. El OP and http://powerpivotforum.com.au was using Excel 2010, so i couldn't provide the above formula (Excel 2010 does not support VAR syntax). So I had to remove the VAR syntax. First I removed the first VAR (line 2 of the step 6 anterior) as follows. Note how I had to use BEFORE on the line 11 to manage the context of the row.
= VAR dateToUse = CALCULATE( MAX(Changes[start_date]), FILTER( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIER( Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") ), Changes[start_date] < EARLIER(Changes[start_date]) ) ) RETURN CALCULATE( MAX(Changes[new_value]), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type"), FILTER(Changes, Changes[start_date] = dateToUse) )
Finally I removed the second VAR from the step 6 as follows. I took the lines 3-13 from the above formula and replaced dateToUse above on the line 19 with this formula.
= CALCULATE( MAX(Changes[new_value]), FILTER(Changes, Changes[member_id] = EARLIER(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type"), FILTER( Changes, Changes[start_date] = CALCULATE( MAX(Changes[start_date]), FILTER( CALCULATETABLE( Changes, FILTER(Changes, Changes[member_id] = EARLIEST(Changes[member_id])), FILTER(Changes, Changes[change_type] = "Type") ), Changes[start_date] < EARLIEST(Changes[start_date]) ) ) ) )
After making this change, note that this time on the lines 14 Y 17 I had to modify the formula to properly manage the context of the nested row, but this time I had to refer to EARLIER instead of EARLY. This is because there are 3 row context nested sets. The first is the calculated column, the second is the FILTER in the row 6 and the third is the FILTER in the row 11.
A final word
The key point of this post is that the final formula above is difficult to read and understand. You may think that this formula is written starting from the top and working your way down, but this is far from the truth. Complex DAX formulas rarely (if ever) are written from top to bottom. The trick to writing complex DAX formulas is to break the problem into pieces and solve each piece of the puzzle step by step.
Here is the sample_workbook and here are all the formulas I wrote.
Do you have a better solution?
I know there are some talented DAX Ninjas reading my blog (Jess, Owen to name 2). I always love to learn, so if you have a formula / better solution or interesting alternative approach than this, I would love to see it. Post in the comments.