How do I get scrapy pipeline to fill my mongodb with my items? Here is what my code looks like at the moment which is a reflection of the information i got off of scrapy documentation.
I also want to mention that I have tried returning items instead of yielding, as well tried using item loaders. All methods seem to have the same outcome.
on that note I want to mention that if I run the command
mongoimport --db mydb --collection mycoll --drop --jsonArray --file ~/path/to/scrapyoutput.json
my database gets populated(as long as I yield and don't return items)... I would really love to get this pipeline working though...
okay so here is my code:
here is my spider
    import scrapy
    from scrapy.selector import Selector
    from scrapy.loader import ItemLoader
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    from scrapy.http import HtmlResponse
    from capstone.items import CapstoneItem
    class CongressSpider(CrawlSpider):
        name = "congress"
        allowed_domains = ["www.congress.gov"]
        start_urls = [
            'https://www.congress.gov/members',
        ]
    #creating a rule for my crawler. I only want it to continue to the next page, don't follow any other links.
    rules = (Rule(LinkExtractor(allow=(),restrict_xpaths=("//a[@class='next']",)), callback="parse_page", follow=True),)
    def parse_page(self, response):
        for search in response.selector.xpath(".//li[@class='compact']"):
            yield {
                'member' : ' '.join(search.xpath("normalize-space(span/a/text())").extract()).strip(),
                'state' : ' '.join(search.xpath("normalize-space(div[@class='quick-search-member']//span[@class='result-item']/span/text())").extract()).strip(),
                'District' : ' '.join(search.xpath("normalize-space(div[@class='quick-search-member']//span[@class='result-item'][2]/span/text())").extract()).strip(),
                'party' : ' '.join(search.xpath("normalize-space(div[@class='quick-search-member']//span[@class='result-item'][3]/span/text())").extract()).strip(),
                'Served' : ' '.join(search.xpath("normalize-space(div[@class='quick-search-member']//span[@class='result-item'][4]/span//li/text())").extract()).strip(),
            }
settings:
    BOT_NAME = 'capstone'
    SPIDER_MODULES = ['capstone.spiders']
    NEWSPIDER_MODULE = 'capstone.spiders'
    ITEM_PIPLINES = {'capstone.pipelines.MongoDBPipeline': 300,}
    MONGO_URI = 'mongodb://localhost:27017'
    MONGO_DATABASE = 'congress'
    ROBOTSTXT_OBEY = True
    DOWNLOAD_DELAY = 10
here is my pipeline.py import pymongo
    from pymongo import MongoClient
    from scrapy.conf import settings
    from scrapy.exceptions import DropItem
    from scrapy import log
    class MongoDBPipeline(object):
        collection_name= 'members'
        def __init__(self, mongo_uri, mongo_db):
            self.mongo_uri = mongo_uri
            self.mongo_db = mongo_db
        @classmethod
        def from_crawler(cls, crawler):
            return cls(
                mongo_uri=crawler.settings.get('MONGO_URI')
                mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
            )
        def open_spider(self,spider):
            self.client = pymongo.MongoClient(self.mongo_uri)
            self.db = self.client[self.mongo_db]
        def close_spider(self, spider):
            self.client.close()
        def process_item(self, item, spider):
            self.db[self.collection_name].insert(dict(item))
            return item
here is items.py import scrapy
    class CapstoneItem(scrapy.Item):
        member = scrapy.Field()
        state = scrapy.Field()
        District = scrapy.Field()
        party = scrapy.Field()
        served = scrapy.Field()
last but not least my output looks like this:
    2017-02-26 20:44:41 [scrapy.core.engine] INFO: Closing spider (finished)
    2017-02-26 20:44:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 8007,
    'downloader/request_count': 24,
    'downloader/request_method_count/GET': 24,
    'downloader/response_bytes': 757157,
    'downloader/response_count': 24,
    'downloader/response_status_count/200': 24,
    'finish_reason': 'finished',
    'finish_time': datetime.datetime(2017, 2, 27, 4, 44, 41, 767181),
    'item_scraped_count': 2139,
    'log_count/DEBUG': 2164,
    'log_count/INFO': 11,
    'request_depth_max': 22,
    'response_received_count': 24,
    'scheduler/dequeued': 23,
    'scheduler/dequeued/memory': 23,
    'scheduler/enqueued': 23,
    'scheduler/enqueued/memory': 23,
    'start_time': datetime.datetime(2017, 2, 27, 4, 39, 58, 834315)}
    2017-02-26 20:44:41 [scrapy.core.engine] INFO: Spider closed (finished)
so it seems to me like I am not getting any errors, my items were scraped. if i had ran it with a -o myfile.json i could import myfile to my mongodb but the pipeline just isn't doing anything!
     mongo
     MongoDB shell version: 3.2.12
     connecting to: test
     Server has startup warnings: 
      2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten]                              2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] **    WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
     2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
     2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] 
     2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
     2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
     2017-02-24T18:51:24.276-0800 I CONTROL  [initandlisten] 
     > show dbs
     congress  0.078GB
     local     0.078GB
     > use congress
     switched to db congress
     > show collections
     members
     system.indexes
     > db.members.count()
     0
     > 
I suspect my problem has to do with my settings file. I am new with scrapy and mongodb and I have a feeling I haven't told scrapy where my mongodb is correctly. here are some other sources I found, I tried using them as examples but everything I tried just lead to the same result(scraping was done, mongo was empty) https://realpython.com/blog/python/web-scraping-and-crawling-with-scrapy-and-mongodb/ https://github.com/sebdah/scrapy-mongodb I have a bunch more sources but not enough reputation to post more unfortunately. anyway~ any thoughts would be much appreciated thanks.
 
    