Represent the number in binary, then look for the most significant bit (the highest nonzero bit). Naively you can do this by right shifting one bit at a time until it is zero - that was "one too many". That is basically the approach you tried. A bit faster would be a binary search. For 32 bit integer, shift right by 16; if still > 0, right shift by 8, etc. I'm sure you can figure it out from here. 
Code example:
typedef unsigned long long int ulli;
ulli floor2(ulli num){
  int msb = 8*sizeof(num)/2;
  ulli temp = ((ulli)1)<<msb;
  while(msb>1){
    msb/=2; // using divide for clarity
    if(temp>num) temp>>=msb; else temp<<=msb;
  }
  if (temp>num) temp/=2;
  return temp;
}
I ran some benchmarks of this algorithm against the topBit as well as the builtIn method. A loop with 10M iterations, generating a "long" random number, takes 362 ms on my system (with no compiler optimization). If the loop includes one of the methods of calculation, times increase as follows:
=============  total    net
builtin:         407     45
binary search:   579    215
topBit:         2295   1933
The built in method is definitely the fastest by a significant margin - not really surprising! With 64 bit numbers, topBit will on average need 32 loops (half the bits are set, so get skipped) and binary only 5 loops, so you would expect about 6x speed difference; that is roughly what you see. When I define ulli as unsigned short (16 bit), the time difference is about 2x.