I have this code:
def somefunc(self):
...
if self.mynums>= len(self.totalnums):
if 1 == 1: return self.crawlSubLinks()
for num in self.nums:
if not 'hello' in num: continue
if 0 == 1:
#if though this is never reached, when using yield, the crawler stops execution after the return statement at the end.
#When using return instead of yield, the execution continues as expected - why?
print("in it!");
yield SplashRequest(numfunc['asx'], self.xo, endpoint ='execute', args={'lua_source': self.scripts['xoscript']})
def crawlSubLinks(self):
self.start_time = timer()
print("IN CRAWL SUB LINKS")
for link in self.numLinks:
yield scrapy.Request(link callback=self.examinenum, dont_filter=True)
As you can see, the SplashRequest is never reached, so its implementation is not important in this case. So the goal is to keep sending requests by returning self.crawlSubLinks. Now here is the problem:
When I use return before the SplashRequest that is never reached, the crawler continues its execution as expected by processing the new requests from crawlSubLinks. However, for some reason, when I use yield before the SplashRequest that is never reached, the crawler stop after the return statement! Whether I use yield or return in a line that is never executed should not matter at all, right?
Why is this? I have been told that this has something to do with the behavior of Python, only. But how can I then have a yield statement within the for loop while still returning in the if statement above the for loop and not return a generator?
