I have strange behaviour of data.table and %in% operator. I am loading data.table with Russian letters in utf-8 header.
d = fread(filename, sep="\t", encoding="UTF-8", verbose=TRUE)
bar=names(d)
bar
 [1] "Дата, Время"      "Состояние"        "Ia, A"            "Ib, A"            "Ic, A"           
 [6] "Дисб.I"           "акт.P,кВт"        "P, кВА"           "cos"              "Загр., %"        
[11] "Uвх.AB,В"         "Uвх.BC,В"         "Uвх.CA,В"         "Дисб. U, %"       "R, кОм"          
[16] "F Турб.вращ.,Гц"  "Приток,куб.м/cут" "Отбор,куб.м/cут"  "P, ат."           "Расход, куб.м/c" 
[21] "Tдвиг, °C"        "Tжид, °C"         "Pвыкид, ат."      "Tвыкид, °C"       "Вибр X/Y, м/с2"  
[26] "Вибр Z, м/с2"     "Pвыс.р, ат."      "Iутеч, мA"        "Tобм, °C"         "Акт.энерг,кВт"   
[31] "Реакт.энерг,кВАр" "Вход1,ед."        "Вход2,ед."        "Вход3,ед."        "Вход4,ед."       
[36] "Вход5,ед."        "Вход6,ед."        "Вход7,ед."        "Вход8,ед."        "Статусн.сообщ."
I have one of values hardcoded in code
foo="Uвх.AB,В"
And trying to do the following
if (foo %in bar) { ... } 
to the surprise
foo %in% bar
[1] FALSE
but
foo==bar
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
notice the TRUE on 11th position, the reason is in encoding
Encoding(foo)
[1] "UTF-8"
Encoding(bar)
 [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[10] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[19] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[28] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[37] "unknown" "unknown" "unknown" “unknown"
On data.table behalf it is just a bit strange because I’ve asked encoding="UTF-8” on fread. On the other hand %in% aka match behaviour difference with == is also very strange.
I sense the wrongness of the universe, could somebody explain me why is %in% acts in so strange way with encodings and what is correct way of using it?
