I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:
def evalPoint(point, theta):
    # do some complex calculation
    return (cost, grad)
which is invoked by this function:
def eval(theta, client, lview, data):
    async_results = []
    for point in data:
        # evaluate current data point
        ar = lview.apply_async(evalPoint, point, theta)
        async_results.append(ar)
    # wait for all results to come back
    client.wait(async_results)
    # and retrieve their values
    values = [ar.get() for ar in async_results]
    # unzip data from original tuple
    totalCost, totalGrad = zip(*values)
    avgGrad =  np.mean(totalGrad, axis=0)
    avgCost = np.mean(totalCost, axis=0)
    return (avgCost, avgGrad)
If I run the code:
client = Client(profile="ssh")
client[:].execute("import numpy as np")        
lview = client.load_balanced_view()
for i in xrange(100):
    eval(theta, client, lview, data)
the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit. 
The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.
I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.
Some additional facts:
- I'm using Python 2.7.2 on Ubuntu 11.10
 - I'm using IPython version 0.12
 - I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
 - The only thing I've found similar to a memory leak for IPython had to do with 
%run, which I believe was fixed in this version of IPython (also, I am not using%run) 
update
Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.
response(1)
The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.
If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().
If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.
If I use SQLite and try to purge, I get the following error:
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
  self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
  expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
  expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string
So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).