How can I get the request url in Scrapy's parse() function? I have a lot of urls in start_urls and some of them redirect my spider to homepage and as result I have an empty item. So I need something like item['start_url'] = request.url to store these urls. I'm using the BaseSpider.
- 9,280
 - 9
 - 43
 - 57
 
- 6,644
 - 11
 - 34
 - 54
 
- 
                    did this method work? – NKelner Nov 20 '13 at 22:33
 - 
                    instead of storing them aside, during scraping you can access `requested_url`, check below my answer – Rohan Khude Dec 13 '17 at 12:19
 
5 Answers
The 'response' variable that's passed to parse() has the info you want. You shouldn't need to override anything.
eg. (EDITED)
def parse(self, response):
    print "URL: " + response.request.url
- 11,380
 - 8
 - 54
 - 76
 
- 2,471
 - 2
 - 22
 - 26
 
- 
                    11But that is not the request url, but the response url. Scrapy's middleware handles redirections, therefore you can obtain a different url. – gusridd Jan 20 '16 at 13:09
 - 
                    6
 - 
                    2If the url has redirection, then it gives redirected url not the provided url – Rohan Khude Dec 13 '17 at 09:42
 
The request object is accessible from the response object, therefore you can do the following:
def parse(self, response):
    item['start_url'] = response.request.url
- 864
 - 10
 - 16
 
Instead of storing requested URL's somewhere and also scrapy processed URL's are not in same sequence as provided in start_urls. 
By using below,
response.request.meta['redirect_urls']
will give you the list of redirect happened like ['http://requested_url','https://redirected_url','https://final_redirected_url']
To access first URL from above list, you can use
response.request.meta['redirect_urls'][0]
For more, see doc.scrapy.org mentioned as :
RedirectMiddleware
This middleware handles redirection of requests based on response status.
The urls which the request goes through (while being redirected) can be found in the redirect_urls Request.meta key.
Hope this helps you
- 4,455
 - 5
 - 49
 - 47
 
- 
                    I believe all you need is: `redirect_urls = response.meta.get("redirect_urls")` – Jack Oct 29 '20 at 10:44
 - 
                    1
 
You need to override BaseSpider's make_requests_from_url(url) function to assign the start_url to the item and then use the Request.meta special keys to pass that item to the parse function
from scrapy.http import Request
    # override method
    def make_requests_from_url(self, url):
        item = MyItem()
        # assign url
        item['start_url'] = url
        request = Request(url, dont_filter=True)
        # set the meta['item'] to use the item in the next call back
        request.meta['item'] = item
        return request
    def parse(self, response):
        # access and do something with the item in parse
        item = response.meta['item']
        item['other_url'] = response.url
        return item
Hope that helps.
- 419
 - 3
 - 6
 
Python 3.5
Scrapy 1.5.0
from scrapy.http import Request
# override method
def start_requests(self):
    for url in self.start_urls:
        item = {'start_url': url}
        request = Request(url, dont_filter=True)
        # set the meta['item'] to use the item in the next call back
        request.meta['item'] = item
        yield request
# use meta variable
def parse(self, response):
    url = response.meta['item']['start_url']
- 46
 - 2