Calculate Z-Score with Power Query
A literal interpretation of a Z score as prompted to Copilot (kidding)
What is a Z-Score?
Assuming the data that you have resembles a normal distribution (big if), a z-score can be helpful in terms of comparing a single value with the average (mean) value. Getting the z-score can provide you with a way of determining just how high (or low) a value is.
The normal distribution (Source: Wikipedia)
If the data approximates a normal distribution, then 68.2% of the values should be within +/- 1 standard deviation from the mean, 95.4% of values should be within +/- 2 standard deviations, and 99.7% of values should be within +/- 3 standard deviations.
Power Query has List.StandardDeviation, but it does not have a dedicated function for finding the z-score. So I made one.
What is the Math?
Basically the z-score is calculated by looping through each value in the column, subtracting it from the mean of that column, and then dividing that by the standard deviation of that column.
How Do You Do This With M?
Here is the code (and here is the gist):
(val as number,col as list) =>
let
stdev = List.StandardDeviation(col),
mean = List.Average(col),
z = (val - mean)/stdev
in
z
As far as functions go it is pretty simple. Pass in the value for that column and the whole list of values and it finds the standard deviation and mean and calculates the z-score accordingly.
I just added a custom column and the trick is to pass in the value which is just the reference to the column name ( [Total] ) and then pass in the whole list of values ( base[Total] ). In the second parameter base is the name of the previous applied step. The each keyword is applied to the first parameter and not the second this way.
Is This Helpful?
Maybe? I can see instances where this could be used to help find outliers in a simple manner. If the data approximates a normal distribution, then anything more than +/- 3 standard deviations is outside of 99.7% of what would be expected. Having this calculated as a column in Power Query, or DAX, could be useful.
Comments are closed.