PDFs often contain fonts without explicit mappings to Unicode, preventing us from extracting correct text from them -- curse you, Adobe!
I need to process PDFs in a batch fashion on a Linux system. I have several examples here with hyphenated lines, but for which no tool I have tried can identify the hyphens; the results always contain a lot of broken half-words.
Is there a way to contribute missing character mappings rather than dropping the undefined symbols?