1

I have a set of plain text files with a mix of Hebrew and English. These files are from the late 90s to early 2000s and were apparently written in NisusWriter.

When I open the text files the english lines render correctly but the Hebrew lines are jumbled up mojibake like this:

 Â∆˙ŸÙœÏ∆À˙À̆††ÂŸ‰À÷Õ·†††‡Œ˙†††‰ÀÚ⁄·«„À‰††††

I wrote a loop to run iconv with every encoding it supports but none of the outputs were fixed

Running hexdump -C on the first 3 lines (one English, one jumbled Hebrew, and one transliteration in Latin characters) gave the following. It seems the jumbled Hebrew is written as just . chars

00000010  50 2e 20 31 31 30 20 20  2d 20 41 56 4f 44 41 48  |P. 110  - AVODAH|
00000020  0d 0d 20 f8 d9 f6 cd e4  a0 ac a0 a0 a0 e9 d9 e9  |.. .............|
00000030  cb a0 a0 a0 e0 db ec dd  e4 cd d8 e9 f0 e5 c6 a0  |................|
00000040  ac a0 a0 a0 e1 c6 d9 f2  cc ee c6 d9 ea cb a0 a0  |................|
00000050  a0 e9 cf f9 dd d9 f8 cb  e0 cd ec a0 ac 0d 0d 52  |...............R|
00000060  65 2d 74 7a 65 68 d5 2c  20 20 20 20 41 64 6f 6e  |e-tzeh.,    Adon|
00000070  61 69 20 20 20 20 20 20  45 2d 6c 6f 2d 68 65 69  |ai      E-lo-hei|
00000080  d5 2d 6e 75 20 2c 20 20  20 20 20 20 20 62 65 2d  |.-nu ,       be-|
00000090  61 6d 2d 63 68 61 d5 20  20 20 20 20 20 20 20 79  |am-cha.        y|
000000a0  69 73 2d 72 61 2d 65 6c  d5 20 0d 62 65 20 70 6c  |is-ra-el. .be pl|
000000b0  65 61 73 65 64 2c 20 20  20 20 41 64 6f 6e 61 69  |eased,    Adonai|
phuclv
  • 30,396
  • 15
  • 136
  • 260
Ezra A
  • 21

1 Answers1

1

I found the solution in the end and thought I'd write it down in case others had similar problems.

I used a site I found to try every encoding under the sun until I got one where the hebrew rendered correctly, albeit with mysterious characters separator characters.

https://www.motobit.com/util/charset-codepage-conversion.asp

I was able to verify this by using the @user1686 suggestion to use the hex values from hexdump, and cross referencing in with the encoding table.

In the end it turned out the file was encoded as x-mac-hebrew

Ezra A
  • 21