We're a small group that is promoting the spread of Unicode in India (here legacy encodings are deeply entrenched). But I have a problem when I convert a document in unicode text in any Indic language to PDF format. The text displays as intended, but on copy pasting the content partially turns gibberish.
I am using inDesign CC for typesetting on a Win 7. I can export to epub format just fine. But the exported PDF has this problem. I also tried printing to Adobe PDF printer and PrimoPDF, it only got worse. On checking out PDF's on the internet, turns out this problem exists in all such unicode encoded Indic PDF (and probably all East Asian complex scripts). Is that a problem in the PDF specs?
Check out the PDF here http://www.rajbhasha.nic.in/pdf/dolebook-4.pdf
Copy any text and match with the original, you'll see characters are replaced by other characters, unnecessary white space has crept in.
Now we're promoting unicode on grounds that it'll make copy-pasting and searching/indexing easier. This problem totally destroys that. Any ideas?