I am working on a toy file system, I am using a bitset to keep track of used and unused pages. I am using an array of ints (in order to use GCC's built in bit ops.) to represent the bitset. I am not using the std::bitset as it will not be available on the final environment (embedded system.). 
Now according to Linux perf during the tests allocating files takes 35% of runtime of this, 45% of the time is lost setting bits using,
#define BIT_SET(a,b) ((a) |= (1ULL<<(b)))
inside a loop. According to perf 42% of the time is lost in or. Deleting is a bit faster but then most time is lost in and operation to clear the bits toggling the bits using xor did not made any difference.
Basically I am wondering if there are smarter ways to set multiple bits in one go. If user requests 10 pages of space just set all bits in one go, but the problem is space can span word boundries. or any GCC/Clang instrinsics that I should be aware of?
 
    