4

How do I remove all lines containing any non-ASCII keyboard characters?

I tried so many times Regular Expressions codes but none work like it should be I even tried this code [^\x00-\x7F]+ but it didn't select all the characters

the idea come on my mind is to use this way [^a-z0-9``~!@#$%^&*()-_=+[]{}\|;:'"<>,./?] but still not work because some of this characters didn't get deselected like \ / | { } [ ] $ # ^ ( )

  1. If a line contains any characters not in the list below, I want to remove remove it or bookmark it

    0123456789`~!@#$%^&*()-_=+[]{}\/|;:'"<>,.?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
  2. Simple example: There are more characters like this found here: https://en.wikipedia.org/wiki/List_of_Unicode_characters

    0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    ¤©ª«¬¯°±²³´µ¶·¸¹º»¼½¾¿÷ÆIJŒœƔƕƋƕ
    ƜƝƢƸƾDžNJNjǽǾǼɁɀȾɎʒəɼʰʲʱʴʳʵʶʷʸˁˀˇˆ˟ˠ
    ˩˧Ͱͱͳʹͼͻͺ͵ͿΏΔΘΞΛΣΠΦΧΨΩΪΫάέήίΰαβδε
    θηκλμξπςρφχψωϊϋϏώϑϐϓϒϔϕϖϠϟϞϝϜϡϢ
    ϤϣϧϫϬϮϯϰϱ₠₡₢₣₤₥₦₧₨₩₪₫€₭₮₯₰₱₲
    ₳₴₵₶₷₸₹₺₻₼₽₾₿⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜
    ⅝⅞⅟℠℡™℣ℤ℥Ω℧ℨ℩KÅℬℭ℮ℯ⇀⇁ↀↁↂↃↄ
    ⇔⇕⇖⇗⇘⇙⇚⇛⇜⇝⇞⇟⇠⇡⇢⇣⇤⇥⇦⇧⇨⅀⅁⅂⅃⅄ⅅ
    ⅆⅇⅈⅉ⅊⅋⅌⅍ⅎ⅏ⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽ
    
  3. Expected result:

    0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
DavidPostill
  • 162,382

3 Answers3

4

[^\x00-\x7F] works fine, but, if you want to use a long character class like [^a-z0-9``~!@#$%^&*()-_=+[]{}\|;:'"<>,./?] you have to escape characters that have a special meaning (ie. -[]\ and add linebreak \r,\n.

Your regex becomes:

 [^a-z0-9``~!@#$%^&*()\-_=+\[\]{}\\|;:'"<>,./?\r\n]
 #                    ^    ^ ^   ^            ^^^^

  • Ctrl+H
  • Find what: [^a-z0-9``~!@#$%^&*()\-_=+\[\]{}\\|;:'"<>,./?\r\n]+$ But, again, [^\x00-\x7F] works fine and is more readable
  • Replace with: LEAVE EMPTY
  • check Wrap around
  • check Regular expression
  • Replace all

Result for given example:

0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Toto
  • 19,304
0

If you are agnostic to the solution and not fixed to Notepad++ you could install bash for Win 10, as I showed here https://superuser.com/a/1252271/715210 (sorry I always come back to your questions with Linux workarounds ;) )

I would have a solution, where you unfortunately also will loose the apostrophe '

  1. open bash for Windows over start menu
  2. Go to the folder, where your file is located with cd /mnt/c/path/folder (the drive C: is on /mnt/c)
  3. If your file is named foo.txt you could generate a file bar.txt with this command:

    cat foo.txt | tr -cd '[:alnum:]\n\r~!@#$%^&*()-_=+{}\|;:<>,./?"`' | sed '/^$/d' > bar.txt

Explanation of the parts:

cat foo.txt outputs the text file and with the pipe | the output is redirected to the commande tr -cd which removes every char, which is not in the list after betwenn '...'. Followed by a pipe tosedto remove the empty lines. Last but not least with> bar.txt` we redirect the output to the file bar.txt

Thanks to:

chloesoe
  • 716
0

In Notepad++ this is easy:

  1. menu Search > Mark...

  2. Find what: [^\x00-\x7F]
    ☑ Mark line
    (•) Regular expression

  3. Press Find All

  4. menu Search > Bookmark > Remove bookmarked lines

miroxlav
  • 14,845