How do I remove HTTP links with ActiveSupport's "starts_with" using Nokogiri?

Question

When I try this:

item.css("a").each do |a|
  if !a.starts_with? 'http://'
     a.replace a.content
  end
end

I get:

NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60>

EDIT:

Sure there is a cleaner way, but this seems to be working.

item.css("a").each do |a|
  unless a["href"].blank?
    if !a["href"].starts_with? 'http://' 
      a.replace a.content
    end
  end
end

score 1 · Accepted Answer · answered May 16 '11 at 01:29

The problem is you're trying to use the starts_with method on an object that doesn't implement it.

item.css("a").each do |a|

will return XML nodes in a. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:

a['href']

So, you want to use something like this:

item.css("a").each do |a|
  if !(a.starts_with?['href']('http://'))
     a.replace(a.content)
  end
end

The downside to this is you have to walk through every <a> tag in the document, which can be slow on a big page with lots of links.

An alternate way to go about it is to use XPath's starts-with function:

require 'nokogiri'

item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html

which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>

Here's how to do it using XPath:

item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
  a.replace(a.content)
end
puts item.to_html

Which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>

The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.

VERY thorough answer. Thank you. – pcasa May 16 '11 at 19:35 — pcasa, May 16 '11 at 19:35

score 0 · Answer 2 · answered May 07 '11 at 15:37

0

Shouldn't that method be start_with?

answered May 07 '11 at 15:37

buruzaemon

3,847
1
23
44

tried that to just in case, but same error. using rails 1.9.2. Edited question, meant !a.starts_with? – pcasa May 07 '11 at 15:38

How do I remove HTTP links with ActiveSupport's "starts_with" using Nokogiri?

2 Answers2

Linked