what are the advantages and disadvantages of the following libraries?
From the above i've used QP and it failed to parse invalid HTML, and simpleDomParser, that does a good job, but it kinda leaks memory because of the object model. But you may keep that under control by calling $object->clear(); unset($object); when you dont need an object anymore.
Are there any more scrapers? What are your experiences with them? I'm going to make this a community wiki, may we'll build a useful list of libraries that can be useful when scraping.
i did some tests based Byron's answer:
    <?
    include("lib/simplehtmldom/simple_html_dom.php");
    include("lib/phpQuery/phpQuery/phpQuery.php");
    echo "<pre>";
    $html = file_get_contents("http://stackoverflow.com/search?q=favorite+programmer+cartoon");
    $data['pq'] = $data['dom'] = $data['simple_dom'] = array();
    $timer_start = microtime(true);
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $x = new DOMXPath($dom);
    foreach($x->query("//a") as $node)
    {
         $data['dom'][] = $node->getAttribute("href");
    }
    foreach($x->query("//img") as $node)
    {
         $data['dom'][] = $node->getAttribute("src");
    }
    foreach($x->query("//input") as $node)
    {
         $data['dom'][] = $node->getAttribute("name");
    }
    $dom_time =  microtime(true) - $timer_start;
    echo "dom: \t\t $dom_time . Got ".count($data['dom'])." items \n";
    $timer_start = microtime(true);
    $doc = phpQuery::newDocument($html);
    foreach( $doc->find("a") as $node)
    {
       $data['pq'][] = $node->href;
    }
    foreach( $doc->find("img") as $node)
    {
       $data['pq'][] = $node->src;
    }
    foreach( $doc->find("input") as $node)
    {
       $data['pq'][] = $node->name;
    }
    $time =  microtime(true) - $timer_start;
    echo "PQ: \t\t $time . Got ".count($data['pq'])." items \n";
    $timer_start = microtime(true);
    $simple_dom = new simple_html_dom();
    $simple_dom->load($html);
    foreach( $simple_dom->find("a") as $node)
    {
       $data['simple_dom'][] = $node->href;
    }
    foreach( $simple_dom->find("img") as $node)
    {
       $data['simple_dom'][] = $node->src;
    }
    foreach( $simple_dom->find("input") as $node)
    {
       $data['simple_dom'][] = $node->name;
    }
    $simple_dom_time =  microtime(true) - $timer_start;
    echo "simple_dom: \t $simple_dom_time . Got ".count($data['simple_dom'])." items \n";
    echo "</pre>";
and got
dom:         0.00359296798706 . Got 115 items 
PQ:          0.010568857193 . Got 115 items 
simple_dom:  0.0770139694214 . Got 115 items 
 
    