Scenario:
I have an application that makes use of iTextSharp to scourge PDF files for hyperlinks.
Hyperlinks in PDFs are a sub-type of an "annotation object" in the file structure, so my code essentially (1) reads a file, (2) loops through pages, (3) gets the annotations collection for the page, and (4) extracts the hyperlink annotations for the page.
Issue
Sometimes the "pdf dictionary" object representing a given page does not have a collection of annotations (no /ANNOTS) key. Thus attempts at getting such a collection return null. This is an issue because it happens now and then when there are plainly visible and clickable links on the page in question.
Note that clickable is important here because I understand there may be URL addresses present in the plain text, but I do not care about those, only the actual true-to-life hyperlinks.
Code
I found similar SO question (http://stackoverflow.com/questions/6959076/reading-hyperlinks-from-pdf-file) by the answer provided is almost exactly the code I'm already using. The key difference is this:
// My code
var pdfAnnotations = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
foreach (var annotation in pdfAnnotations.ArrayList) {}
{
// Chris' code
var annotsArray = pageDict.GetAsArray(PdfName.ANNOTS);
foreach(var annotation in annotsArray.ArrayList) { }
// My pageDict.Get() and Chris's pageDict.GetAsArray() methods both
// return null because there is no ANNOTS key present in pageDict.
Question
Why the null value? How can a PDF document with plainly visible/clickable links have no annotations collection? Are there other PdfObject sub-types within the file structure that represent hyperlinks/URI?
Thanks