I am having a python's pickled object which generates a 180 Mb file. When I unpickle it, the memory usage explode to 2 or 3Gb. Do you have similar experience? Is it normal?
The object is a tree containing a dictionary : each edge is a letter, and each node is a potential word. So to store a word you need as much edges as the length of this word. So, the first level is 26 node maximum, the second one is 26^2, the third 26^3, etc... For each node being a word I have an attribute pointing toward the informations about the word (verb, noun, definition, etc...).
I have words of about 40 characters maximum. I have around half a million entry. Everything goes fine till I pickle (using a simple cpickle dump) : it gives a 180 Mb file. I am on Mac OS, and when I unpickle these 180 Mb, the OS give 2 or 3 Gb of "memory / virtual memory" to the python process :(
I don't see any recursion on this tree : the edges have nodes having themselves an array of array. No recursion involved.
I am a bit stuck : the loading of these 180 Mb is around 20 sec (not speaking about the memory issue). I have to say my CPU is not that fast : core i5, 1.3Ghz. But my hard drive is an ssd. I only have 4Gb of memory.
To add these 500 000 word in my tree, I read about 7 000 files containing each one about 100 words. Making this reading make the memory allocated by mac os going up to 15 Gb, mainly on virtual memory :( I have been using the "with" statement ensuring the closing of each file, but doesn't really help. Reading a file take around 0.2 sec for 40 Ko. Seems quite long to me. Adding it to the tree is much faster (0.002 sec).
Finally I wanted to make an object database, but I guess python is not suitable to that. Maybe I will go for a MongoDB :(
class Trie():
    """
    Class to store known entities / word / verbs...
    """
    longest_word  = -1
    nb_entree     = 0
    def __init__(self):
        self.children    = {}
        self.isWord      = False
        self.infos       =[]
    def add(self, orthographe, entree):  
        """
        Store a string with the given type and definition in the Trie structure.
        """
        if len(orthographe) >Trie.longest_word:
            Trie.longest_word = len(orthographe)
        if len(orthographe)==0:
            self.isWord = True
            self.infos.append(entree)
            Trie.nb_entree += 1
            return True
        car = orthographe[0]
        if car not in self.children.keys():
            self.children[car] = Trie()
        self.children[car].add(orthographe[1:], entree)
 
     
     
    