Given the discussion diverged from the one on performance of networkx to the optimal way to store "almost" complete graph I will concentrate on summarizing the rationale behind rather using tuples than frozenset type for the dictionary key for the moment.
I was trying to find a confirmation for that but given slightly more methods frozenset can take a bit more memory than tuple. From this question I learned that hashing algorithm has been reimplemented which helps for performance of dictionary inserts and lookups (which takes hash of the key on the way) but on the other hand Python is heavily optimized for tuples, lists, and strings of various lengths which makes me wonder if 2-tuples are still not faster than frozenset if only for this reason.
Now, when we consider NumPy arrays - the reason they may be better for the task is manifold:
- The memory is contiguous which helps dramatically with cache
locality (important thing when doing a traverse of the whole
array).
NumPy is more optimal than ordinary lists for a larger scale of data (say tens of thousands of values).
- Individual values are stored more efficiently (see below for explanation).
In your case you seem to need to store 2 values - one float, one int. You could allocate 2 2-dim ndarrays - one of int and one of float32 type. You can fill in the array either diagonally and create a special accessor method (that would check both orders of indices - this may be slower) or fill in both indices (say: 1,2 and 2,1).
I am assuming that you do not require both values at all times therefore decoupling int and float32 values would be actually beneficial for the performance of algorithms using respective values. The memory consumed by ndarrays should be smaller and consecutive processing of the indices much faster than in case of a dictionary which quite randomly jumps around the memory.