I am using the following to check for (internet) connection errors in my spider.py:
def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(url, callback=self.parse, errback=self.handle_error)
def handle_error(self, failure):
    if failure.check(DNSLookupError):   # or failure.check(UnknownHostError):
        request = failure.request
        self.logger.error('DNSLookupError on: %s', request.url)
        print("\nDNS Error! Please check your internet connection!\n")
    elif failure.check(HttpError):
        response = failure.value.response
        self.logger.error('HttpError on: %s', response.url)
    print('\nSpider closed because of Connection issues!\n')
    raise CloseSpider('Because of Connection issues!')
    ...
However, when the spider runs and the connection is down, I still get a Traceback (most recent call last): messages. I would like to get rid of this by handling the error and shutting down the spider properly.
The output I get is:
2018-10-11 12:52:15 [NewAds] ERROR: DNSLookupError on: https://x.com
DNS Error! Please check your internet connection!
2018-10-11 12:52:15 [scrapy.core.scraper] ERROR: Error downloading <GET https://x.com>
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3.6/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 954, in startConnectionAttempts
    "no results for hostname lookup: {}".format(self._hostStr)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: x.com.
From this you can notice the following:
- I am able to partially handle the (first?) DNSLookupErrorerror, but...
- shutting down the spider does not seem fast enough so the spider continue to try to download the URL, causing a different error (ERROR: Error downloading).
- possibly causing a 2nd error: twisted.internet.error.DNSLookupError:?
How can I handle [scrapy.core.scraper] ERROR: Error downloading and make sure the spider get shut down properly?
(Or: How can I check internet connection on spider startup?)
 
     
    