September 12, 2024

Dynamically Remove Characters with Character.ToNumber()

When I was writing more VBA, I found a post somewhere that showed how to pull out specific groups of characters by their ASCII number. That little function worked really, really well with dirty data. It was not as powerful as a Regular Expression, but if I had some text that was moderately messed up, I could try this little function and it worked pretty well.

I have been trying to put together a personal #shared library in M and thought this would be cool to replicate. So I did.

The Code

Here is a link to the gist: RemoveUnwantedCharacters.pq (github.com)

The function takes 6 parameters:

(
     str as text, 
     optional keep_upper as logical, 
     optional keep_lower as logical, 
     optional keep_nums as logical, 
     optional keep_specials as logical, 
     optional keep_chars as text
)

First comes the string you want to work with. After that is four optional boolean parameters that allow you to either keep uppercase, keep lowercase, keep numbers, or keep special characters. Lastly, you can put a string in like “.@” and the function would exclude the period and ampersand from being removed.

Example Dataset

idtext
1H3ll0! Th!s 1s @ t3st.
2D@t@ $3t w!th r@nd0m ch@r@ct3rs.
3T3xt w!th numb3rs & $pec!@l ch@r@ct3rs.
4R@nd0m t3xt: #1, @2, $3, %4, ^5.
5M!x3d ch@r@ct3rs: *&^%$#@!12345
6T3st!ng d@t@ w!th $pec!@l ch@r@ct3rs.
7R@nd0m numb3rs: 12345, 67890, 54321.
8Sp3c!@l ch@r@ct3rs: !@#$%^&*()_+
9T3xt w!th r@nd0m ch@r@ct3rs & numb3rs.
10Dumm¥ d@t@ s3t w!th r@nd0m ch@r@ct3rs.
Courtesy of Copilot

Output

Keep only special characters and “H”:

Keep numbers and exclamation points:

Keep alphas and spaces:

Wrap Up

There are many ways to accomplish this task, but I like this function for a quick way to clean a string up. I am adding it to the library I am building and hope to add more. If you have a novel way of doing this with Power Query, let me know.

You may also like