I have a .csvfile with multiple foreign languages(russian, japanese, arabic,etc) info within. For example, a column entry look like this:<U+03BA><U+03BF><U+03C5>.I want to remove rows which have this kind of info. I tried various solutions for, all of them with no result:
test_fb5 <- read_csv('test_fb_data.csv', encoding = 'UTF-8')
or applied for a column:
gsub("[<].*[>]", "")` or `sub("^\\s*<U\\+\\w+>\\s*", "")
or
gsub("\\s*<U\\+\\w+>$", "")
It seems that R 4.1.0 doesn't find the respective chars. I cannot find a way to attach a small chunk of file here. Here is the capture of the file:
                        address
33085                                           9848a 33 avenue nw t6n 1c6 edmonton ab canada alberta
33086                                 1075 avenue laframboise j2s 4w7 sainthyacinthe qc canada quebec
33087 <U+03BA><U+03BF><U+03C5><U+03BD><U+03BF><U+03C5>p<U+03B9>tsa 18050 spétses greece attica region
33088                                       390 progress ave unit 2 m1p 2z6 toronto on canada ontario
                                                     name
33085                                md legals canada inc
33086                             les aspirateurs jpg inc
33087 p<U+03AC>t<U+03C1>a<U+03BB><U+03B7><U+03C2>patralis
33088                    wrench it up plumbing mechanical
                                                               category
33085 general practice attorneys divorce  family law attorneys notaries
33086                                                              <NA>
33087               mediterranean restaurants fish  seafood restaurants
33088            plumbing services damage restoration  mold remediation
             phone
33085  17808512828
33086  14507781003
33087 302298072134
33088  14168005050
the 3308's are the rows of the dataset Thank you for your time!
 
     
    