My code needs to work with a large array of structures containing multiple strings.
in practice, the whole array will contain about 25k structures with a size of about 256 byte each, so the whole array needs about 6 MiB of heap space.
// example
struct element {
    char foo[32];
    char bar[16];
    ...
}; // sizeof(struct element) = 256
I was concerned about the performance of calloc due to it zeroing all the memory, also I don't need every byte to be initialized. So I did element_arr = malloc(num_elements * sizeof(struct element)).
I allocate the array at runtime as I don't know num_elements at compile time.
For my code to work, I actually only need the first bytes of each member (foo, bar, etc.) to be zero, the rest can stay uninitialized.
Say I got 8 string members per struct, so I only need 3% of my zeroed bytes, the other 97% cleared bytes are waste as they will get overwritten by real data eventually).
I see a few options:
- zero everything at once, e.g. with - callocwhich does (I hope) make use of vectored instructions to write large blocks of aligned zeroes.
- memseteach 256-byte sized- struct elementbefore filling it with real data.
- assign 0 to each member of - struct elementbefore using it. (- *element->foo = 0; ...) This translates to a chain of- movinstructions, with optimizations at- -O3. It is cumbersome to write language-wise (but can be taken care of).
        mov     byte ptr [rdi + 152], 0
        mov     byte ptr [rdi + 208], 0
        mov     byte ptr [rdi + 200], 0
        mov     byte ptr [rdi + 128], 0
        ...
looks similar for arm64.
- make a very conservative assumption about the size of element_arr(e.g. 64 MiB), place it in a zero-initialized section of memory. (The OS needs to zero my memory then)
char element_arr[64 * 1000 * 1000] = {0};
(checking num_elements < 250000 to be sure)
Does it make a difference what option to choose ? What would you suggest ?
Edit: @John Bayko The individual structures are filled incrementally, but all strings need to start with '\0' otherwise the algorithm can't distinguish between a real string (already got filled) or uninitialized garbage.
After reading the other answers I'm convinced that it probably won't be a problem anytime soon. It's good to know that the simplest solution (calloc) is a good one in the majority of use cases.
I profiled my code on my dev machine and indeed, the time spent on allocation is neglectible.
Thanks for your replies.
 
     
     
     
     
    