i built myself a scraper. Having multiple targets on the same page i wanted to create a list which contains all 'url's' which then should get scraped. The scraping takes some time and i need to scrape them at the same time. Because i do not want to 'maintain' x Skripts for x url's i thougt of multiprocessing and spawning a process for each url in the 'list'. After some duckduckgo and reading for example here: https://keyboardinterrupt.org/multithreading-in-python-2-7/ and here: When should we call multiprocessing.Pool.join? i came up with the code provided.
Executed in a cmd line, the code Executes the main loop but without entering the scrape() function (inside would be some print messages which are not outputed). No Error message is given and the script exits like normal.
What am i missing?
I am using Python 2.7 on a win x64.
I already read:
Threading pool similar to the multiprocessing Pool?
https://docs.python.org/2/library/threading.html
https://keyboardinterrupt.org/multithreading-in-python-2-7/
but i didn't help.
def main():
    try:
        from multiprocessing import process
        from multiprocessing.pool import ThreadPool
        from multiprocessing import pool
        thread_count = 10 # Define the limit of concurrent running threads
        thread_pool = ThreadPool(processes=thread_count) # Define the thread pool to keep track of the sub processes
        known_threads = {}
        list=[]
        list=def_list() # Just assigns the url's to the list
        for entry in range(len(list)):
            print 'starting to scrape'
            print list[entry]
            known_threads[entry] = thread_pool.apply_async(scrape, args=(list[entry]))
        thread_pool.close() # After all threads started we close the pool
        thread_pool.join() # And wait until all threads are done
        except Exception, err:
            print Exception, err, 'Failed in main loop'
        pass