I have a Scrapy project that uses custom middleware and a custom pipeline to check and store entries in a Postgres DB. The middleware looks a bit like this:
class ExistingLinkCheckMiddleware(object):
def __init__(self):
... open connection to database
def process_request(self, request, spider):
... before each request check in the DB
that the page hasn't been scraped before
The pipeline looks similar:
class MachinelearningPipeline(object):
def __init__(self):
... open connection to database
def process_item(self, item, spider):
... save the item to the database
It works fine, but I can't find a way to cleanly close these database connections when the spider finishes, which irks me.
Does anyone know how to do that?