If you take advantage of two Python packages, pypandoc and panflute, you could do it quite pythonically in a few lines (sample code):
Given a text file example.md, and assuming you have Python 3.3+ and already did pip install pypandoc panflute, then place the sample code in the same folder and run it from the shell or from e.g. IDLE.
import io
import pypandoc
import panflute
def action(elem, doc):
if isinstance(elem, panflute.Image):
doc.images.append(elem)
elif isinstance(elem, panflute.Link):
doc.links.append(elem)
if __name__ == '__main__':
data = pypandoc.convert_file('example.md', 'json')
doc = panflute.load(io.StringIO(data))
doc.images = []
doc.links = []
doc = panflute.run_filter(action, prepare=prepare, doc=doc)
print("\nList of image URLs:")
for image in doc.images:
print(image.url)
The steps are:
- Use
pypandoc to obtain a json string that contains the AST of the markdown document
- Load it into
panflute to create a Doc object (panflute requires a stream so we use StringIO)
- Use the
run_filter function to iterate over every element, and extract the Image and Link objects.
- Then you can print the urls, alt text, etc.