Dynamically Remove Characters with Character.ToNumber()
When I was writing more VBA, I found a post somewhere that showed how to pull out specific groups of characters by their ASCII number. That little function worked really, really well with dirty data. It was not as powerful as a Regular Expression, but if I had some text that was moderately messed up, I could try this little function and it worked pretty well.
I have been trying to put together a personal #shared library in M and thought this would be cool to replicate. So I did.
The Code
Here is a link to the gist: RemoveUnwantedCharacters.pq (github.com)
The function takes 6 parameters:
(
str as text,
optional keep_upper as logical,
optional keep_lower as logical,
optional keep_nums as logical,
optional keep_specials as logical,
optional keep_chars as text
)
First comes the string you want to work with. After that is four optional boolean parameters that allow you to either keep uppercase, keep lowercase, keep numbers, or keep special characters. Lastly, you can put a string in like “.@” and the function would exclude the period and ampersand from being removed.
Example Dataset
id | text |
---|---|
1 | H3ll0! Th!s 1s @ t3st. |
2 | D@t@ $3t w!th r@nd0m ch@r@ct3rs. |
3 | T3xt w!th numb3rs & $pec!@l ch@r@ct3rs. |
4 | R@nd0m t3xt: #1, @2, $3, %4, ^5. |
5 | M!x3d ch@r@ct3rs: *&^%$#@!12345 |
6 | T3st!ng d@t@ w!th $pec!@l ch@r@ct3rs. |
7 | R@nd0m numb3rs: 12345, 67890, 54321. |
8 | Sp3c!@l ch@r@ct3rs: !@#$%^&*()_+ |
9 | T3xt w!th r@nd0m ch@r@ct3rs & numb3rs. |
10 | Dumm¥ d@t@ s3t w!th r@nd0m ch@r@ct3rs. |
Output
Keep only special characters and “H”:
Keep numbers and exclamation points:
Keep alphas and spaces:
Wrap Up
There are many ways to accomplish this task, but I like this function for a quick way to clean a string up. I am adding it to the library I am building and hope to add more. If you have a novel way of doing this with Power Query, let me know.
Comments are closed.