I'm beginner in Python, and I'm using Scrapy for a personnel web project.
I use Scrapy to extract data from several websites repeatedly, so I need to check on every crawling if a link is already in the database before adding it. I did this in a piplines.py class:
class DuplicatesPipline(object):
    def process_item(self, item, spider):
        if memc2.get(item['link']) is None:
            return item
        else:
            raise DropItem('Duplication %s', item['link'])
But I heard that using Middleware is better for this task.
I found it a little hard to use Middleware in Scrapy, can anyone please redirect me to a good tutorial.
advices are welcome.
Thanks,
Edit:
I'm using MySql and memcache.
Here is my try according to @Talvalin answer:
# -*- coding: utf-8 -*-
from scrapy.exceptions import IgnoreRequest
import MySQLdb as mdb
import memcache
connexion = mdb.connect('localhost','dev','passe','mydb')
memc2 = memcache.Client(['127.0.0.1:11211'], debug=1)
class IgnoreDuplicates():
    def __init__(self):
        #clear memcache object
        memc2.flush_all()
        #update memc2
        with connexion:
            cur = connexion.cursor()
            cur.execute('SELECT link, title FROM items')
            for item in cur.fetchall():
                memc2.set(item[0], item[1])
    def precess_request(self, request, spider):
        #if the url is not in memc2 keys, it returns None.
        if memc2.get(request.url) is None:
            return None
        else:
            raise IgnoreRequest()
DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.IgnoreDuplicates': 543,
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 500, }
But it seems that the process_request method is ignored when crawling.
Thanks in advance,