"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot.
Questions tagged [google-crawlers]
395 questions
                    
                    36
                    
            votes
                
                4 answers
            
        Passing arguments to process.crawl in Scrapy python
I would like to get the same result as this command line :
scrapy crawl linkedin_anonymous -a first=James -a last=Bond -o output.json
My script is as follows :
import scrapy
from linkedin_anonymous_spider import LinkedInAnonymousSpider
from…
        
        yusuf
        
- 3,591
 - 8
 - 45
 - 86
 
                    27
                    
            votes
                
                2 answers
            
        Is including harmful for pages with hashbang?
Google says about this meta tag:
The following important restrictions apply:
The meta tag may only appear in pages without hash fragments.
Only "!" may appear in the content field.
The meta tag must appear in the head of the document.
Source:…
        
        Christoph
        
- 26,519
 - 28
 - 95
 - 133
 
                    23
                    
            votes
                
                3 answers
            
        Avoid crawling part of a page with "googleoff" and "googleon"
I am trying to tell Google and other search engines not to crawl some parts of my web page.
What I do is:
                    20
                    
            votes
                
                1 answer
            
        Display an article rating in Google search results
Im writing a review site where the community rates posts. I have noticed that Google can pick up on this ratings and display them in its search results. Does anyone know how this is achieved?
An example is a review site like IGN, where in their…
        
        woot586
        
- 3,906
 - 10
 - 32
 - 40
 
                    17
                    
            votes
                
                4 answers
            
        Is it possible to control the crawl speed by robots.txt?
We can tell bots to crawl or not to crawl our website in robot.txt. On the other hand, we can control the crawling speed in Google Webmasters (how much Google bot crawls the website). I wonder if it is possible to limit the crawler activities by…
        
        Googlebot
        
- 15,159
 - 44
 - 133
 - 229
 
                    17
                    
            votes
                
                3 answers
            
        Why do search engine crawlers not run javascript?
I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This…
        
        Khanh TO
        
- 48,509
 - 13
 - 99
 - 115
 
                    16
                    
            votes
                
                3 answers
            
        Should I list PDFs in my sitemap file?
Should I add PDFs to my XML sitemap? 
I want to know if Google will crawl the PDFs.
        user132114
                    15
                    
            votes
                
                4 answers
            
        Does googlebot keep sessions when crawling?
When googlebot crawls pages does it have session? For example I am storing some variables on the session and using them in my site's pages. When googlebot crawls these pages will I still have the session-variables? In my global.asax I am storing…
        
        Furkan Gözükara
        
- 22,964
 - 77
 - 205
 - 342
 
                    12
                    
            votes
                
                5 answers
            
        Google Crawler in Search Console can't found routes in React using Github Page
My problem is Crawl in Google Search Console can't found sub-routes in React.
The URL is https://huynhsamha.github.io/crypto, and crawler can fetch and render homepage (route /) and static files such as /robots.txt, /favicon.ico, but it can't found…
        
        Ha. Huynh
        
- 1,772
 - 11
 - 27
 
                    11
                    
            votes
                
                5 answers
            
        how to tell if a web request is coming from google's crawler?
From the HTTP server's perspective.
        
        orph
        
- 113
 - 1
 - 1
 - 5
 
                    9
                    
            votes
                
                2 answers
            
        Fetch as Google - Googlebot (desktop) not rendering page correctly
I'm having an issue with getting Googlebot to correctly render my webpage(s).
It's rendering the header and one "row" of my page (just the page's top background picture), and then failing to render anything beyond that, not even the footer, missing…
        
        Lukeg4
        
- 91
 - 1
 - 5
 
                    8
                    
            votes
                
                1 answer
            
        SEO for Angular 2 (Non-Universal) apps
I have a deployed angular 2 app working nicely in Production. The issue is the web crawlers are not actually able to crawl and index the whole site, I only see the main index page/route being crawled. FYI, my application is not using universal…
        
        Corporal
        
- 157
 - 3
 - 14
 
                    7
                    
            votes
                
                2 answers
            
        Indexing angularjs app - Googlebot-simulation vs site:domain
I have recently created a webpage using Angularjs and I'm currently trying to get it indexed by Google using pushstate. 
I've done quite abit of research and found out that I can use Googlebot-simulater in Google Webmaster tools to simulate a…
        
        Backer
        
- 1,094
 - 1
 - 20
 - 33
 
                    6
                    
            votes
                
                4 answers
            
        robots.txt content itself is indexed?
The contents of my robots.txt file are actually itself indexed and show up in Google search results. It's only Google and not Yahoo for example.
I really think Google should understand not to index the contents of my robots file as it's only there…
        
        michael
        
- 652
 - 9
 - 12
 
                    6
                    
            votes
                
                1 answer
            
        GitHub repository not listing in Google search - no way to submit url
I have made my Github repo public a week ago but it is still not visible in google search even if I search it like site:https://github.com/user/reponame. Answers for similar questions on Stackoverflow suggest to feed the url of the repo to Google…
        
        JenyaKh
        
- 2,040
 - 17
 - 25