March 17, 2024

Using Table.Profile() and Table.Schema() for Max Impact

Table.Profile and Table.Schema are two table functions that can by themselves provide a lot of information about a table in Power Query. They are quick, simple to call, and easy to remember. When performing exploratory data analysis (EDA), they come in super handy.

But they also have other use cases. Want to perform the same action across all columns of the same data type? Do you want to select all non-null columns? Read on then!

Disclaimer

These methods have been done repeatedly by Microsoft MVPs/SMEs/people that are amazing at Power BI. They are nothing really that new, but the thinking about using the functions in different ways is what is important. I find myself going back to these functions more and more as I streamline the first set of steps after pulling in the data.

Change All DateTimeZone Columns to Date with Table.Schema

Table.Schema exposes the data type of each column in the table. If it does that, then you can filter on certain data types, grab the column names, then transform their data types. That is exactly what this function does:

(tbl as table)=>

let

     // Select by data type and gab the column names

     Select = Table.SelectRows(Table.Schema(tbl), each _[Kind] = "datetimezone")[Name],
    
    // Get a count of the columns
    Count = List.Count(Select),
    
    
// Use TransformColumnTypes and List.Zip to change the data types
    
Result = Table.TransformColumnTypes(tbl, List.Zip({Select, List.Repeat({type date},Count)}))

in
    
    Result

List.Zip is the hero of this function, as it dynamically assigns the data type (type date) to the list of column names selected in the first step. With this zipped key-value list, passing that to Table.TransformColumnTypes changes the data types.

Why would you want to do this? Well, many times having date values down to the second is a level of granularity that is not needed for reporting. Additionally. having a column with many unique datatimezone values could affect performance. So, generally if you don’t need that level of granularity, you end up changing these columns to dates.

Select All Non-Null Columns with Table.Profile

Table.Profile provides descriptive statistics about your table. One column is a count of null values. If this value is equal to the row count of the table, then you have figured out which columns have all null values.

Here is the code:

(tbl as table)=>

let
    Source = 
            // Select those columns
        Table.SelectColumns(
            tbl, 
            // Select all non-null column names
            Table.SelectRows(
            // Add a helper column for filtering, if null
                Table.AddColumn(
                    Table.Profile(tbl), 
                    "IsNull", 
                    each if [Count] = Number.From([NullCount]) then 1 else 0, 
                    type number
                ), 
                each _[IsNull] = 0
            )
            [Column]
        )
in
    Source

This function adds a helper column to test whether the row count is the same as the null count. After adding this column, it grabs the column names of all non-null columns (having value 0 in the helper column) and then selects them.

Takeways

Again, these concepts are not new. But knowing table transformation functions and how to use Table.Schema and Table.Profile can help you streamline the first few query steps. Generally, I use these to figure out the condition I need to apply (all null columns, all datetimezone columns), grab the names of the columns, then do something else. Once you have the names in a list, other functions like List.Zip can help you further in terms of transforming column types and other scenarios.

You may also like