- preg_match( '/<title>(.*)<\/title>/',.....)
- preg_match("/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i",....)
            Asked
            
        
        
            Active
            
        
            Viewed 212 times
        
    0
            
            
         
    
    
        Felix Kling
        
- 795,719
- 175
- 1,089
- 1,143
 
    
    
        runeveryday
        
- 2,751
- 4
- 30
- 44
- 
                    Looks like they would extract information from a HTML page. The title and the addresses of images. – Felix Kling Feb 10 '11 at 08:43
2 Answers
6
            The first is to extract the contents from a HTML title tag.
The second is to extract images' src attributes from a HTML document, but is very imperfect (It won't catch references to image resources that end in .jpeg or have no extension at all).
Regular expressions are not a good idea for parsing HTML! One should use a HTML parser instead. They are far from fireproof.
- 
                    @Pekka Yes, always tell'em [to not do that](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). +1 – Linus Kleen Feb 10 '11 at 08:45
- 
                    Note that the first regex will fail if the line (or, in multi-line mode, the _entire document_) has multiple `` elements. That may be unlikely for this specific case but in general produces very bad results. – Chris Lutz Feb 10 '11 at 08:48
- 
                    1why the edit? `The regexes will probably both do a half-way decent job - if part of an existing project, you can probably leave them be. But they are far from fireproof, and if you're building stuff from scratch, don't use this approach.` Most people will continue to use bad code but the should be encouraged to fix it instead. – beggs Feb 10 '11 at 08:49
- 
                    Also I thinknthe second regex is terrible too for the same reason. It's very lazy about validating what can and can't be in a string and may grab too much unless I'm badly mistaken. – Chris Lutz Feb 10 '11 at 08:51
- 
                    @beggs I'd say it depends on the situation. If it's a newbie finding his way through production code, it won't be their first priority. In general however, you're right, edited that out. @Chris good points! – Pekka Feb 10 '11 at 08:51
- 
                    
- 
                    1@runeveryday they are patterns and delimiters. `(.*)<\/title>` means "grab everything up to the next occurrence of `` and return it as part of the result. The `/` is used as a delimiter around the expression. There's some more info here http://www.regular-expressions.info/php.html – Pekka Feb 10 '11 at 09:12
- 
                    thank you,i know, but to the second line code. why there is no / delimiter after ]?/i – runeveryday Feb 10 '11 at 09:17
- 
                    @runeveryday `i` is a flag that comes after the delimiter, specifying case insensitive search (in order to also catch `JPG`, `GIF` ....) – Pekka Feb 10 '11 at 09:17
0
            
            
        1) Matches anything between <title> and </title> a la an HTML page's title, so run against <title>foo</title> results in the match being foo.
2) Matches any string following src= that ends in png, jpg or gif.  Used to extract the URL of images in HTML code.
Per @Pekka's answer: don't do this in real world code.
 
    
    
        beggs
        
- 4,185
- 2
- 30
- 30
 
    