for a web crawler project in C# I try to execute Javascript and Ajax to retrieve the full page source of a crawled page.
I am using an existing web crawler (Abot) that needs a valid HttpWebResponse object. Therefore I cannot simply use driver.Navigate().GoToUrl() method to retrieve the page source.
The crawler downloads the page source and I want to execute the existing Javascript/Ajax inside the source.
In a sample project I tried the following without success:
        WebClient wc = new WebClient();
        string content = wc.DownloadString("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
        string tmpPath = Path.Combine(Path.GetTempPath(), "temp.htm");
        File.WriteAllText(tmpPath, content);
        var driverService = PhantomJSDriverService.CreateDefaultService();            
        var driver = new PhantomJSDriver(driverService);
        driver.Navigate().GoToUrl(new Uri(tmpPath));
        string renderedContent = driver.PageSource;
        driver.Quit();
You need the following nuget packages to run the sample: https://www.nuget.org/packages/phantomjs.exe/ http://www.nuget.org/packages/selenium.webdriver
Problem here is that the code stops at GoToUrl() and it takes several minutes until program terminates without even giving me the driver.PageSource.
Doing this returns the correct HTML:
driver.Navigate().GoToUrl("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string renderedContent = driver.PageSource;
But I don't want to download the data twice. The crawler (Abot) downloads the HTML and I just want to parse/render the javascript and ajax.
Thank you!