If n=2 is quite small compared to a.shape[0], it might be worth while using this little function.  The basic idea is calculate a mask that is just large enough to give the desired number of final rows.  Here I do it iteratively.  Normally iteration is slow, but if the number of iterations is small enough, the time saving elsewhere could be worth it.
def mask(a):
    return a[:,0]>1
def paul_filter1(a,n):
    # incremental w/ sum
    j = a.shape[0]
    for i in xrange(n,j+1):
        am = mask(a[:i,:])
        if np.sum(am)>=n:
            j = i
            break
    return a[am,:]
Note that the mask am can be shorter than the dimension it is working on.  Effectively it pads the rest with False.  I haven't checked if this is documented.
In this small example, foo is 3x slower than a[a[:,0]>1,:][:2,:].
But with a larger array, say a2=np.tile(a,[1000,1]), the time with foo remains the same, but the 'brute force' keeps slowing down as it has to apply the mask to more rows.  Of course these timings do depend on where in a the desire rows are located.  There won't be any savings if foo has to use nearly all of the rows.
edit
Addressing Bi Rico's concerns about the repeated np.sum (even through that is fast compiled code), we could incrementally build the where:
def paul_filter3(a,n):
    # incremental adding index
    j = a.shape[0]
    am = mask(a[:n,:])
    am = np.where(am)[0].tolist()
    if len(am)<n:
        for i in xrange(n,j):
            if mask(a[[i],:]):
                am.append(i)
                if len(am)>=n:
                    break
    am = np.array(am)
    return a[am,:]
For small n this is even faster.
Something closer to the original method, is to calculate the full mask, but then trim it.  cumsum can be used to find the minimum length.
def paul_filter4(a,n):
    # cumsum
    am = mask(a)
    j = np.cumsum(am).searchsorted(n)
    return a[am[:j+1],:]
Tested with the 1000x1000 random array of integers (1:12), times are (using 20 instead of 2, and tweaking the mask so more rows are False
In [172]: timeit paul_filter4(a,20)
1000 loops, best of 3: 690 us per loop
In [173]: timeit paul_filter3(a,20)
1000 loops, best of 3: 1.22 ms per loop
In [175]: timeit paul_filter1(a,20)
1000 loops, best of 3: 994 us per loop
In [176]: timeit rico_filter(a,20)
1000 loops, best of 3: 668 us per loop
In [177]: timeit marco_filter(a,20)
10 loops, best of 3: 21 ms per loop  
rico_filter using where is fastest, but my alternative using cumsum isn't far behind.  The 3 incremental filters are similar in speed, about half of the fast ones.
In this a as generated, and tested, most rows are True.  This is consistent with marco's concern that the limit condition is a small subset of the logical condition.  With these conditions Bi Rico's concern that paul_filter1 could blow up is unrealistic.
If I change the testing parameters, so all of the rows of a have to be tested (a[:,0]>11),
then the filters using where and cumsum take just as long as the original.  The incremental filters are slower, by a factor of 15 or more.  But my first attempt using np.sum is fastest of that style.