The question title says it all, after a bit of Googling and several days of tinkering with code, I cannot figure out how to download the plain text of a webpage.
Using strip_tags(); still leaves the JavaScript and CSS and trying to clean it up with regex also causes issues.
Is there any (simple or complicated) way to download a webpage (say a Wikipedia article) in plain-text using PHP?
I downloaded the page using PHP's file_get_contents(); as here:
$homepage = file_get_contents('http://www.example.com/');
As I said, I tried using strip_tags(); etc but I can't get the plain text.
I've tried using: http://millkencode.googlecode.com/svn/trunk/htmlxtractor/ContentExtractor.php to get the main content but it doesn't seem to work.