Press "Enter" to skip to content

Removing Diacritics with Power Query

Chris Webb gets rid of them scribbles what you sometimes find on perfectly good letters:

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Chris does take a different approach, though do read the comments because there are scenarios in which a simple removal of the diacritic can lead to a not-so-subtle alteration of the phrase.