Here is the current code:
def load_data():
    files = glob.glob('../manga-resized/sliced_images/*.png')
    L = []
    target_dist = []
    i = 0
    for fl in files:
        image = color.rgb2lab(io.imread(fl))
        L.append(image[:,:,:1])
        ab = np.vstack(image[:,:,1:])
        #print 'ab shape: ',ab.shape
        #print 'KNN prediction shape: ',KNN.predict_proba(ab).shape
        target_dist.append(KNN.predict_proba(ab))
        i+=1
        print i
    print "finished creating L and target_dist"
    X = np.asarray(L)
    y = np.asarray(target_dist)
    #  remember to .transpose these later to 0,3,1,2
    print 'X shape: ',X.shape,'y shape: ',y.shape
    return X,y
currently I get the Killed: 9 message after i=391. My computer has 16GB of RAM, but I think I am somehow doing this really inefficiently. Eventually I hope to do this with near 1 million files let alone 400. I feel like this should be possible because I know people train with much larger than 400 file datasets. So how am I screwing this up? Is there some memory leak? I thought those couldn't happen in python. Any other reason for the Killed: 9 error?
thanks
edit: here is the result of ulimit -a
Alexs-MBP-6:manga-learn alex$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited
here is the output with memory usage printed - after file 221.
https://bpaste.net/show/26109a193e43 . Clearly the available memory is decreasing but its still there by the time it gets the Killed : 9
Edit 2: I have seen in other places that np.asarray is very inefficient. Addiontally, when I take this part out of the formula, it does just fine and does not get killed. I have seen alternatives such as np.fromiter but those only cover 1D arrays - not the two 4 dimensional arrays that need to be returned here, X and y. Does anyone know the correct numpy way to fill these array?s
 
    