Get Quartile Stats with Power Query
The Four Horsemen of the Sci-pocalypse
Knowing what the quartiles are for a range of values is another decent way of quickly understanding your dataset. If you can segment the values of a column into quartiles, then investigation into poor performing segments and even identification of outliers becomes easier.
Now DAX has quick ways of calculating this with PERCENTILE.INC and PERCENTILE.EXC (and even aggregator versions of these), so figuring this out is very doable with just DAX. That being said, I wanted to understand the logic and understand it better, so I decided to code a combined function with M to do all sorts of things.
fnGetQuartileStats
Here is the code (and the gist):
(val as number, col as list) =>
let
l = List.Sort(col),
count = List.Count(l),
q1 = count * 0.25,
q2 = count * 0.5,
q3 = count * 0.75,
fnRunCheck = (check as number)=>
let
get_q = if check = Number.IntegerDivide(check,1) then List.Average({l{Number.Round(check,0)-1}, l{Number.Round(check,0)}}) else l{Number.Round(check,0)}
in
get_q,
q1_val = fnRunCheck(q1),
q2_val = fnRunCheck(q2),
q3_val = fnRunCheck(q3),
iqr =q3_val - q1_val,
return = [
quartile = if val <= q1_val then 1 else if val <= q2_val then 2 else if val <= q3_val then 3 else 4,
q1= q1_val,
q2 = q2_val,
q3 = q3_val,
interquartile_range = iqr,
outlier = if val < q1_val - (1.5*iqr) then true else if val > q3_val + (1.5 * iqr) then true else false
]
in
return
This function takes a value and the corresponding column and calculates:
- What quartile it falls into (1, 2, 3, or 4)
- What the 1st Quartile value was
- What the 2nd Quartile value was
- What the 3rd Quartile value was
- What the Inter-Quartile Range (IQR) was (Q3 value – Q1 value)
- If the current value is an outlier (using the 1.5 * IQR as a benchmark)
How Do I Use This?
All you have to do is add a custom column and call the function. Pass in the row value first, then the column (append the prior applied step name to the column name – in this example the prior step was “base”):
The function will return a nested record that you can expand:
Is This Useful?
I think so. I like being able to calculate many things at once and choose my own adventure afterwards. Returning records are great for that!
I can also see this being applied to help with simple segmentation and having another way to find outliers.
Comments are closed.