As long as the figures in question are appropriately tagged (as they are in your example document), you can determine their bounding boxes based on the PDFBox PDFGraphicsStreamEngine.
You actually can make use of the BoundingBoxFinder from this answer (based on the PDFGraphicsStreamEngine) which determines the bounding box of all content of a page, you merely have to retrieve the bounding box information marked content sequence by marked content sequence.
The following class does that by storing bounding box information in a hierarchy of MarkedContext objects
public class MarkedContentBoundingBoxFinder extends BoundingBoxFinder {
public MarkedContentBoundingBoxFinder(PDPage page) {
super(page);
contents.add(content);
}
@Override
public void processPage(PDPage page) throws IOException {
super.processPage(page);
endMarkedContentSequence();
}
@Override
public void beginMarkedContentSequence(COSName tag, COSDictionary properties) {
MarkedContent current = contents.getLast();
if (rectangle != null) {
if (current.boundingBox != null)
add(current.boundingBox);
current.boundingBox = rectangle;
}
rectangle = null;
MarkedContent newContent = new MarkedContent(tag, properties);
contents.addLast(newContent);
current.children.add(newContent);
super.beginMarkedContentSequence(tag, properties);
}
@Override
public void endMarkedContentSequence() {
MarkedContent current = contents.removeLast();
if (rectangle != null) {
if (current.boundingBox != null)
add(current.boundingBox);
current.boundingBox = (Rectangle2D) rectangle.clone();
} else if (current.boundingBox != null)
rectangle = (Rectangle2D) current.boundingBox.clone();
super.endMarkedContentSequence();
}
public static class MarkedContent {
public MarkedContent(COSName tag, COSDictionary properties) {
this.tag = tag;
this.properties = properties;
}
public final COSName tag;
public final COSDictionary properties;
public final List<MarkedContent> children = new ArrayList<>();
public Rectangle2D boundingBox = null;
}
public final MarkedContent content = new MarkedContent(COSName.DOCUMENT, null);
public final Deque<MarkedContent> contents = new ArrayDeque<>();
}
(MarkedContentBoundingBoxFinder utility class)
You can apply it to a PDPage pdPage like this
MarkedContentBoundingBoxFinder boxFinder = new MarkedContentBoundingBoxFinder(pdPage);
boxFinder.processPage(pdPage);
MarkedContent markedContent = boxFinder.content;
(excerpt from DetermineBoundingBox helper method drawMarkedContentBoundingBoxes)
You can output the bounding boxes from that markedContent object like this:
void printMarkedContentBoundingBoxes(MarkedContent markedContent, String prefix) {
StringBuilder builder = new StringBuilder();
builder.append(prefix).append(markedContent.tag.getName());
builder.append(' ').append(markedContent.boundingBox);
System.out.println(builder.toString());
for (MarkedContent child : markedContent.children)
printMarkedContentBoundingBoxes(child, prefix + " ");
}
(DetermineBoundingBox helper method)
In case of your example document you get
Document java.awt.geom.Rectangle2D$Double[x=90.35800170898438,y=758.10498046875,w=128.63946533203125,h=10.2509765625]
Figure java.awt.geom.Rectangle2D$Double[x=90.35800170898438,y=758.10498046875,w=44.6771240234375,h=10.2509765625]
P java.awt.geom.Rectangle2D$Double[x=136.79600524902344,y=760.1184081963065,w=43.137100359018405,h=6.383056943803922]
Figure java.awt.geom.Rectangle2D$Double[x=184.2926788330078,y=758.10498046875,w=34.70478820800781,h=10.2509765625]
Similarly you can draw the bounding boxes on the PDF using the drawMarkedContentBoundingBoxes methods of DetermineBoundingBox. In case of your example document you get:
