Use Regex in Power BI with Python & Power Query
Even snake_case text can’t hide from Regex and Python
Knowing where to infuse things like Python to your Power BI projects can make your life a lot easier. One of the ways you can start (if you already haven’t) is to use Python to apply regular expressions.
What is a regular expression (regex)? It is essentially a way to define a pattern and manipulate text (match, replace, etc). For most data processing steps, you probably don’t need it. But there may come a point where you get multiple lines of text in a column and need to pull certain text out. Or maybe you need to check to see if it contains personally identifiable information. You could apply a bunch of different steps to capture the text, but would those steps apply to every row?
I have seen people do wonders in terms of writing M to parse all sorts of things. So, I am not saying it is impossible to just stick to M, possibly with custom functions that take care of what you need. I try to do that a lot, but sometimes you hit a wall. Enter regex!
How Do I Use This With Power Query?
I won’t cover how to set up Python with Power Query in Power BI. There is a Microsoft Learn article that details how to do this.
Once you set up Python, I have a written a very basic function that allows you to call the re library in Python and return a new table with matches. Note: this function performs matches, not any of the other things like replacing text based on matches (as an example).
Here is the code:
(tbl as table, pattern as text, base_col as text, new_col as text)=>
let
Script = "import re#(lf)import pandas as pd#(lf)#(lf)pattern = r'" & pattern &
"'#(lf)dataset['" & new_col & "'] = dataset['"
& base_col & "'].str.extract(pattern)",
Run_Python = Python.Execute(Script,[dataset=tbl]),
return = Run_Python{[Name="dataset"]}[Value]
in
return
It takes four parameters:
- The table you are processing
- The regular expression match pattern
- The column name you want to apply the match to
- The name of the new column that extracts the text matching the pattern
To use it, add it as a blank query, give it a name (e.g. fnPyRegexMatch) and add a step to the query table you want to use it on, and pass in the parameters. That simple.
But How Do I Even Know What Patterns To Use?
That’s the tougher part, but luckily with Copilot in Bing you have a helper. Also there is this cool site called AutoRegex where you can pass in what you are looking for (in actual words) and it will generate regex for you.
There are also community sites like Regex101 that have community patterns and an online debugger to help you not only learn but also test.
Closing Thoughts
This is just another tool in your toolbox with Power BI. Unfortunately, Power Query in Excel does not have the ability to call Python so that is out, but you can likely call the re library from Python in Excel (outside of Power Query).
This is a very, very simple case of how languages like Python and R can extend Power BI to make more complicated tasks simpler. I have been trying to extend my knowledge more and have done some things I previously thought would be too difficult to do in Power BI with a few lines of Python.
There will be some that say, “Why don’t you just do it all in Python?” I certainly do from time to time, but if 90-95% of the work could be done in Power BI and writing a little script in Python could get me across the finish line without bogging down refresh, then why not use it in Power BI?
Comments are closed.