0

I have a folder with lot of html files, I would like to extract only the text contained in the body of this html to a txt file, how can I do that ?

Meds
  • 389

1 Answers1

1

You can iterate over each file in the directory and use a command-line browser such as lynx or w3m to render the HTML to plaintext and save this into a text file.

Lynx example:

lynx -dump in.html > out.txt

w3m example:

w3m -dump in.html > out.txt
rbialon
  • 128