I have been attempting to use IE automation to google search a string of text in Excel. I want to return the hyperlink for the website of the first result in another cell in excel. Is this possible? I have a list of 60,000 records that I need to google search and return the hyperlink for the website in the first result. Is there another approach to this that you would reccomend? I appreciate the help in advance.
2 Answers
As its 60,000 records i recommend use xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster
Sub XMLHTTP()
Dim url As String, lastRow As Long, i As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("h3")
For Each link In objH3
If link.className = "r" Then
Cells(i, 2) = link.innerText
Cells(i, 3) = link.getelementsbytagname("a")(0).href
DoEvents
End If
Next
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time) & " :minutes"
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Using CSS3 Selector
Sub XMLHTTP1()
Dim url As String, i As Long, lastRow As Long
Dim XMLHTTP As Object, html As New HTMLDocument, objResultDiv As HTMLAnchorElement
lastRow = Range("A" & Rows.Count).End(xlUp).Row
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = New HTMLDocument
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.querySelector("div#rso h3.r a")
Cells(i, 2) = objResultDiv.innerText
Cells(i, 3) = objResultDiv.href
DoEvents
Next
End Sub
Output

HTH
Santosh
- 12,175
- 4
- 41
- 72
-
1Thanks! That works well except after about 100 records I recieve a Run Error 80070005 Access is denied. Any clues on why? – Collin Hendo Jul 08 '13 at 21:38
-
1@CollinHendo Nope, will have to see your data for that. Alternatively you can add 'On error resume next' on top of code. If the solution was helpful pls vote. – Santosh Jul 09 '13 at 00:47
-
I could provide you a sample of my data. How could I do that? – Collin Hendo Jul 09 '13 at 12:52
-
Actually the error occurs even if I attempt to use the code on the sample that you provided.Any hints? – Collin Hendo Jul 09 '13 at 13:19
-
It is a strange error. After running about 100-125 record, I get the access denied and cannot use it in any other workbook or anything. – Collin Hendo Jul 09 '13 at 13:49
-
Is it possible that my server cuts off access after pulling 100 or so records and then I am on a delay or cut off? – Collin Hendo Jul 09 '13 at 15:11
-
@CollinHendo Apologies for late response. You can share the workbook using dropbox or google drive and give me the link. I shall get back to you as feasible to me. – Santosh Jul 09 '13 at 18:10
-
https://www.dropbox.com/s/o1rj4fqht2I1ws6/Sample.xlsm Thanks! That's just a sample. As you can see, it ended after about 110 records. – Collin Hendo Jul 09 '13 at 19:37
-
@CollinHendo The link is not working for me. Please make sure you share the link publicly. – Santosh Jul 10 '13 at 03:35
-
My apologies. I thought it was in my public folder. Here is the correct link https://www.dropbox.com/s/t4qh8vo4g88ulu3/Sample.xlsm – Collin Hendo Jul 10 '13 at 10:16
-
I too get this error, just about after 60 entries, any resolutions to this please. – Vasim Nov 06 '13 at 07:58
-
@Vasim I will get back to you on this by tommorow. Can you take a screenshot of issue and give the link. – Santosh Nov 06 '13 at 09:41
-
[link](https://drive.google.com/file/d/0B9u_K1HbgiEKUk5JMlBZdWRzZFE/edit?usp=sharing), it stops at "xmlHttp.send" after sometime (5min or so) it again works - stops after some next 60 entries and so on..... – Vasim Nov 06 '13 at 10:16
-
@Vasim I was able to simulate the issue. With lack of time i will try to fix it in weekend. In the meanwhile you may try to add some delay or make async request. – Santosh Nov 08 '13 at 05:16
-
1@Vasim I have updated the code. Give it a try and let me know the response. I tested for about 140 records which ran uninterrupted. Alternatively you may download the sample file from [here](http://goo.gl/VudD5z) – Santosh Nov 14 '13 at 18:26
-
@Santosh I was trying the get the data of a span tag with class name "_Ex", the code doesn't seem to work. I used the following: `code` Set html = CreateObject("htmlfile") html.body.innerHTML = XMLHTTP.ResponseText Set objResultDiv = html.getelementbyId("rso") Set objH3 = objResultDiv.getelementsbyTagName("span")(0) Set link = objH3.findElementsbyClassName("_Ex") `code` – Krithi07 Apr 21 '16 at 08:10
-
Great script. +1. My question is, I have thousands of records. I want to check with google. Can i try without limitation ? If i try to thousands of records, Will google think me that DDOS attack like that ? Is it legit ? – Venkat Dec 11 '16 at 14:02
The links seem to be consistently within within H3 tags. Normally you might use something like the following to check until the page has loaded:
Private Declare Sub Sleep Lib "kernel32" (ByVal nMilliseconds As Long)
Sub UseIE()
Dim ie As Object
Dim thePage As Object
Dim strTextOfPage As String
Set ie = CreateObject("InternetExplorer.Application")
'ie.FullScreen = True
With ie
'.Visible = True
.Navigate "http://www.bbc.co.uk"
While Not .ReadyState = READYSTATE_COMPLETE '4
Sleep 500 'wait 1/2 sec before trying again
Wend
End With
Set thePage = ie.Document
'more code here
End Sub
However, I would, instead, repeatedly try to reference the A element within the first H3 using getElementsByTagName("H3"), get the first of these elements, then look within this for the A-link and its href-attribute.
In JavaScript the attempts to reference non-existent elements would return undefined but from VBA it will probably need error-handling code.
Once I had obtained the href I would stop the navigation (not sure of the command for this, probably ie.Stop) or navigate to the next page immediately.
The first link(s) will, however, often be sponsored links and the href returned is a little garbled. The text of these sponsored links appear to include em tags. I might use this information to discard these links and look further down the page.
I don't know if there is a better way to do this.
- 19,232
- 5
- 47
- 69