I have a file with 100,000 URLs that I need to request then process. The processing takes a non-negligible amount of time compared to the request, so simply using multithreading seems to only give me a partial speed-up. From what I have read, I think using the multiprocessing module, or something similar, would offer a more substantial speed-up because I could use multiple cores. I'm guessing I want to use some multiple processes, each with multiple threads, but I'm not sure how to do that.
Here is my current code, using threading (based on What is the fastest way to send 100,000 HTTP requests in Python?):
from threading import Thread
from Queue import Queue
import requests
from bs4 import BeautifulSoup
import sys
concurrent = 100
def worker():
    while True:
        url = q.get()
        html = get_html(url)
        process_html(html)
        q.task_done()
def get_html(url):
    try:
        html = requests.get(url, timeout=5, headers={'Connection':'close'}).text
        return html
    except:
        print "error", url
        return None
def process_html(html):
    if html == None:
        return
    soup = BeautifulSoup(html)
    text = soup.get_text()
    # do some more processing
    # write the text to a file
q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=worker)
    t.daemon = True
    t.start()
try:
    for url in open('text.txt'):
        q.put(url.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)
 
     
    