DAX as a query language, part 2

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp

Contents

Level: intermediate

In Myself last article I introduced DAX as a query language and explained why learning how to query your data model can be helpful. I covered the basics on using a tool like DAX Studio to query a database and return a data table as a result. In this article I am going to dig deeper and introduce the most common and useful query functions, including CROSSJOIN, SUMMARIZE y ADDCOLUMNS (there are more features that i don't cover in this article).

whos-coming-with-me-7198103

Here is the link to the PBIX data model again if you want to go ahead and do the examples yourself. I am using DAX Studio again to connect to Power BI Desktop running on my location machine. If you take the time to complete these examples, will significantly increase your learning and knowledge retention.

Lineage

before continuing, an interesting and important fact about Power Pivot (compared to traditional database tools) is the concept of lineage (pronounced LIN-E-AGE). When creating a new virtual table in a query or formula in Power Pivot, the new table will include an automatic one-to-many relationship to the table it was created from. Consider the following simple table from last time.

image_thumb-12-6469419

You can see above that this query produces a single column table of all unique product categories. This new virtual table preserves the lineage of the data model. In this case, the new virtual table comes from the Products table and, Thus, the new virtual table has lineage for the Products table. You can “you're welcome” this as shown below with the new table having a relation of 1 many with the Products table.

image_thumb-13-3816680

Please note the picture above it's just a visual simulation about what is happening. The virtual table is not materialized and you cannot actually see it in relationship view. But the virtual table exists (virtually) and the lineage of the Products table also exists, it's just that you can't really see. I recommend you learn to “to imagine” let this happen in the data model in your mind as it will help you understand how the new virtual table interacts with the rest of the data model, especially with regard to context transition.

All virtual tables have a lineage with the tables from which they come.

CROSS

CROSSJOIN is a function that can create a new table from 2 or more source tables. For this example, I'm going to join some virtual tables. The first virtual table is VALUES (Product[Category]) that, of course, returns a list of all unique product categories.

image_thumb-14-4808511

The second virtual table is a list of all possible Client genres.

image_thumb-15-4146844

Then, I will use CROSSJOIN to create a new table containing all unique joins from both tables.

image_thumb-16-6251972

In the table above there are 4 rows x 2 rows giving a total of 8 rows of all unique values. Continuing with the concept of “to imagine” the way these new virtual tables are related in the data model, it would look like this:

image_thumb-26-6836091

Remember this is just a simulation of what it looks like. These tables at the top are not materialized and you cannot see them in the data model. But can “imagine them” with this look and they behave in exactly the same way as they would if they were physical tables.

M x N can mean a big table

You should be careful with CROSSJOIN as, by definition, the resulting table will be mxn rows long, where m is the number of rows in the table 1 yn is the number of rows in the table 2. If I had to CROSSJOIN on the client table (18,484 rows ) with the Products table (397 rows) would end with more than 7 million rows. This in itself is not a problem for Power Pivot to create such a large table in memory, but it can definitely be a problem if you try to materialize the table. More on that next week.

SUMMARIZE

SUMMARIZE es, with much, my favorite DAX query function. SUMMARIZE can do similar things to CROSSJOIN, but nevertheless, CROSSJOIN can join tables that have no relationships, while SUMMARIZE can only join tables that are related to many-to-many relationship 1.

SUMMARIZE first takes a table and then one or more columns (accessed through a many-to-many relationship 1) that you want to include in the new summarized table.

ABSTRACT (, table[column], Table 2[column],….)

Here is an example.

image_thumb-17-2130599

The above query results are similar to the above CROSSJOIN query, but there is an important difference. SUMMARIZE will only return rows that actually exist in the data itself (note that there is only 6 rows up compared to 8 rows in the CROSSJOIN example).

Consider the relevant data model tables below.

image_thumb-18-9825996

Here is the SUMMARY formula written above.

EVALUATE
SUMMARIZE(Sales, Products[Category], Customers[Gender])

This query starts with the Sales table and then adds the Products[Category] column of the Products and Customers table[Gender] column in the Customers table. The 2 columns specified within the SUMMARIZE formula come from tables on the side 1 of the relationships of many to 1; this is allowed.

The following is not allowed and will not work.

EVALUATE
ABSTRACT (Products, Sales[CustomerKey])

It does not work because the Sales column[CustomerKey] cannot be accessed from the Products table through a many-to-relationship 1.

It is also possible to write a SUMMARIZE statement about any table. In the following example, the SUMMARIZE statement returns a list of all possible combinations of product category and color.

image_thumb-19-2606797

You could also achieve the same result with the ALL function (which would be an easier solution if you are only using a single table)

EVALUATE
ALL(Products[Category], Customers[Gender])

Add summary sales to the summary table

Up to now, the SUMMARIZE queries above are just lists of valid joins. Time to do something more interesting and add the [Total Sales] to these summary tables. before continuing, note that the following formulas are not best practices – there's a better way I'll cover later.

Consider the following formula

EVALUATE
SUMMARIZE(
    Products,
    Products[Category],
    Products[Color],
    "total sales", [Total Sales]
)

Note specifically that the table parameter in this formula is “Products”. Also note below that this formula returns blank rows (shown below).

image_thumb-20-7552575

This summary statement correctly summarizes all Product combinations[Category] and product[Color] in the product table and then for those products where there are sales, those sales are displayed next to the unique combination. But in some cases, unique combination actually has no sales, hence the blank rows.

Use sales as a table parameter

If I change the above formula and change the Products table to the Sales table, blank rows are no longer visible (see below).

image_thumb-21-1745823

SUMMARIZE will always find the unique combinations that actually exist in the selected data. Because this new formula starts in the Sales table, only Product combinations[Category] and product[Color] where there are actual sales are returned.

Context transition or no context transition?

Those of you who are familiar with the concept of context transition may be thinking that context transition is happening here.. That is a valid thing to assume, but this is not what is happening here. Consider the following formula.

image_thumb-22-3035107

Please note how I have changed the measurement [Total Sales] con SUM (Sales[ExtendedAmount]). With this new formula above, there is no CALCULATE that forces the context transition; but nevertheless, despite this, the table still returns the same result. This implies that SUMMARIZE does not operate in a row context. In fact, SUMMARIZE is a Vertipaq Storage Engine operation. The part that produces the valid combinations of columns is very efficient, but nevertheless, calculating total sales figures is very inefficient. For this reason, it is better to use ADDCOLUMNS to sum the sales totals (see below).

Columns

ADDCOLUMNS does exactly what it suggests: add new columns to a table in a query. The general syntax is as follows:

ADDCOLUMNS (, "Column name", ,….)

To demonstrate how this works, let me start with a formula above that produces the following table.

summarize_sales-6559889

The SUMMARY function returns a table of 2 columns with 15 rows: all possible combinations containing sales values. This table can be used as table parameter in ADDCOLUMNS formula as follows.

image_thumb-27-1623155

You can see above that this new table returns the Total Sales for each of the 15 possible combinations.

And it is possible to add as many new columns as you need to the summary table. Look down.

image_thumb-28-3462729

The important differences between ADDCOLUMNS and SUMMARIZE

Now you have seen that it is possible to add columns (as the summary of total sales) to a table using SUMMARIZE and also with ADDCOLUMNS. But there are some important differences between these 2 approaches.

ADDCOLUMNS has a row context

Unlike what I showed with SUMMARIZE earlier in this article, ADDCOLUMNS has a row context. Consider the following query.

image_thumb-29-4968734

When I change the measurement [Total Sales] con SUM (Sales[ExtendedAmount]) the results are wrong. This shows that ADDCOLUMNS operates in a row context.

Efficiency

When given the choice, you should choose to use ADDCOLUMNS in favor of SUMMARIZE to add these additional data columns. ADDCOLUMNS is much more efficient in the way it adds the values ​​to the SUMMARIZE table. SUMMARIZE uses a Vertipaq Storage Engine operation to produce the base table and then ADDCOLUMNS takes advantage of the lineage and context transition to add the value columns; This approach takes advantage of the special capabilities of Power Pivot to get the job done in the most efficient way. For more detailed coverage of this topic, should read this article from the Italians.

Other DAX query functions

I realize that I have not covered all the DAX Query functions in this series of articles. There's others, some of which are only available in newer versions of Power Pivot (for instance, Power BI Desktop, Excel 2016). If you are interested in more information, you can do some research online. I will be covering 1 final show next week: the ROW function.

Uses of DAX queries

Next week I will share my final article in this series where I explain some ways you can use DAX queries in the real world. Make sure to come back next week, or better yet subscribe to my weekly newsletter to receive notifications when there are new articles.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.