like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.
            Asked
            
        
        
            Active
            
        
            Viewed 3.5k times
        
    4 Answers
49
            There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.
Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers
 
    
    
        the Tin Man
        
- 158,662
- 42
- 215
- 303
 
    
    
        Marc-André Lafortune
        
- 78,216
- 16
- 166
- 166
6
            
            
        Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.
Follow the link for a simple crawler example.
gem install ruby-cheerio
require 'ruby-cheerio'
jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")
jQuery.find('h1').each do |head_one|
    p head_one.text
end
# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')
# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text
 
    
    
        dineshsprabu
        
- 165
- 3
- 4
5
            
            
        You can also try Oga by Yorick Peterse.
It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga
 
    
    
        microspino
        
- 7,693
- 3
- 48
- 49
 
    