Is there any option to extract Unicode script?
When I try it for Unicode like Hindi, Marathi, or Devanagari Script it produces the wrong output.
It appears that only Hindi is supported out of the box.
You need use to use the -l langoption:
tesseract 1.png output.txt -l hin
You can train tesseract to recognise other languages like Marathi, or Devanagari.
See How to use the tools provided to train Tesseract 3.0x for a new language
TESSERACT(1) Manual Page
OPTIONS
...
-l lang
The language to use. If none is specified, English is assumed.
Multiple languages may be specified, separated by plus characters.
Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)
...
LANGUAGES
There are currently language packs available for the following
languages:
ara (Arabic), aze (Azerbauijani), bul (Bulgarian), cat (Catalan), ces
(Czech), chi_sim (Simplified Chinese), chi_tra (Traditional Chinese),
chr (Cherokee), dan (Danish), dan-frak (Danish (Fraktur)), deu
(German), ell (Greek), eng (English), enm (Old English), epo
(Esperanto), est (Estonian), fin (Finnish), fra (French), frm (Old
French), glg (Galician), heb (Hebrew), hin (Hindi), hrv (Croation),
hun (Hungarian), ind (Indonesian), ita (Italian), jpn (Japanese), kor
(Korean), lav (Latvian), lit (Lithuanian), nld (Dutch), nor
(Norwegian), pol (Polish), por (Portuguese), ron (Romanian), rus
(Russian), slk (Slovakian), slv (Slovenian), sqi (Albanian), spa
(Spanish), srp (Serbian), swe (Swedish), tam (Tamil), tel (Telugu),
tgl (Tagalog), tha (Thai), tur (Turkish), ukr (Ukrainian), vie
(Vietnamese)
To use a non-standard language pack named foo.traineddata, set the
TESSDATA_PREFIX environment variable so the file can be found at
TESSDATA_PREFIX/tessdata/foo.traineddata and give Tesseract the
argument -l foo.
Source TESSERACT(1) Manual Page