Questions tagged [robots.txt]

8 questions
4
votes
0 answers

Is automated web site access legal?

Many web sites include in their terms of service things about automated access being prohibited. One example is in ebay's robots.txt file: The use of robots or other automated means to access the eBay site\n without the express permission of…
2
votes
2 answers

What software is needed for membership websites and how can they still be indexed by Google

I notice that in some cases paywalled news articles seem to have been indexed by Google because excerpts from the story appears in the search hit. However, when I go to these web sites using a Googlebot (robot) identity the information is not there…
Tyler Durden
  • 6,333
1
vote
1 answer

Apache wont start, port 80 in use by a system process, found baiduspider

Ok so I have uninstalled IIS on my windows server and decided to try Xampp to host my domains. Port 80 is in use and I have tried all of the fixes that I have came across for the past 2 days. I was in need of figuring out what is using process id 4…
1
vote
1 answer

How can we know which URLs can be crawled as robots.txt tells if we don't know to which folder a URL belong to?

I'm going to code a web crawler but before I want to know what is going to be possible to crawl. Tell me if I'm wrong, but in robots.txt websites indicate folders not URLs that can and can't be crawled, so how can we know to which folder a URL…
DevAb
  • 113
0
votes
0 answers

Index a whole website that has blocked Google?

I tried to do a site:site.com [search terms] on Google but site.com has blocked Google from indexing it via its robots.txt. How can I get around this? Can I download and index the whole site myself somehow and then search my own private index?
d-b
  • 956
0
votes
1 answer

Can ROS be played in Windows 10, 64bit?

I try to install ROS(robot operating system) in my computer with windows 10, 64bit. Is it possible? And is there any process to do it?
-1
votes
1 answer

Googlebot blocked by robots.txt

So recently I've tested my site with Google mobile-friendly test and the main loading issue was "Googlebot blocked by robots.txt" My robots.txtdoes allow Google bot I think? What do you think guys? What's the problem here?
-1
votes
1 answer

How to to prevent Google from indexing

We have setup one of web site on server and site build in php and symfony framework, As my requirement is to prevent Google from indexing and below of my robot.txt and Is prevent using .htaccess? User-agent: * Disallow: So How to to prevent it and…