January 28, 2024

Make Multi-Column Dimensional Tables with Power Query

A man making dimensions in the Fabric of space and time (pun intended)

Further Down the Rabbit Hole

Yesterday I posted an article about how you could make single-column dimension tables with Power Query. It generated some good discussion as to the utility of make single column dimension tables.

It is definitely true that if you have a smaller dataset, you really don’t have to go through such lengths. You won’t see a major difference if your dataset is less than a few million rows.

Also, if you had something like 15 text columns, you would not want to break out 15 single-column dimensional tables and start your way down the “let’s make my data model super-complicated” path. Choose wisely.

With that being said, I think the thought experiment was helpful in terms of seeing what Power Query could do. I have used List.Generate() with records a few times now and I think that is a handy little pattern that can be used to perform a lot of bulk functions.

I continued on and was able to apply these thoughts to multi-column dimensional tables. Here goes.

The Base Table

I went to Mockaroo and created the following synthetic dataset:

It is a very simple sales table of some things you may see if someone passed you an Excel file with these things mashed together into one table. I say all that having never worked in sales and the private sector outside of high school retail and fast-food jobs (lol).

Looking at the table, you have 3 text columns dealing with customer information and 2 columns dealing with product information. I am going to use a function to create dimension tables targeted at those columns later on.

After generating the data and importing into Power BI, I again duplicated the table to help generate my dimension tables.

fnCreateMultiColDim

The first function operates on the copy of the fact table. It takes 3 parameters:

  • The table being operated on (usually the previous step)
  • A list of lists of grouped columns (more on that)
  • A list of what you want to call the id columns in the dimension tables (same order as second step)

To use this function, add a step on the duplicated table and write something like:

Note the list of lists. Essentially the second parameter take a list that has grouped lists of columns that will be broken into distinct tables.

Lastly, pass in a list of column names. Make sure that this list is in the same respective order as the order of the nested lists in the second parameter.

Once that is done, you are presented with a table of the nested dimensional tables:

You can then reference this multiple times and drill down to the nested tables to get your dimension tables.

fnCreateMultiDimID

This function is much simpler. Go to the fact table, add a step, then pass in the previous step name as the first parameter and the dimension table as the second parameter:

You should see the columns hidden and the id from the new dimension table hidden.

Closing Thoughts

The ultimately question is: Is this useful? I am not entirely sure it is as I have written it. There are a lot of things that should be considered:

  • What is your datasource?
  • How much data do you have?
  • If it is not a flat file, can you have a view written at the source?
  • If IT is unwilling to write a view and you have the ability to query, can you use Value.NativeQuery() to write SQL to model a STAR schema that way?
  • What do these methods do in terms of refresh performance? Can you watch Jeopardy in the time it takes to refresh the data?

There are probably way more considerations that I am not taking into account. But, I think that in the end this was useful in terms of what could be done in an “order of operations” coding perspective to help newer developers adopt star schema.

In the age of generative AI assisting people with their work, maybe something could be done here?

You may also like