If you want performance improvement, this is the fastest swapping operation, faster than the stl::swap runs on standard C++ compilers.
template<typename T=int>
void swap(T* p, T* q, int size)
{
    T* ptmp = new T[size];
    memcpy(ptmp, p, sizeof(T)*size);
    memcpy(p, q, sizeof(T)*size);
    memcpy(q, ptmp, sizeof(T)*size);
    delete[] ptmp;
}
You can make it even faster by replacing the call to new with (int*)alloca(sizeof(int)*size) and commenting out the delete. But alloca is kind of limited as it uses function stack. Okay so you would call it like this:
 //line 5 
    swap(A[j], A[i]);
    //int t1 = A[j][0];
    // ... 
//line 18
This is from the documentation of std::swap():
Non-array: Constant: Performs exactly one construction and two assignments (although each of these operations works on its own complexity).
Array: Linear in N: performs a swap operation per element.
since this swap() performs operation on block of memory rather than element by element therefore it is better then std::swap(). I have confirmed the results using AQtime.
for anyone thinking about "space-locality, cache-miss-prone, cache aligment, cache friendly blah blah blah..." here it is for them:
the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time. SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.
For people who are confused, here is how the std::swap() is implemented in utility header file VC 2012 by Microsoft
        // TEMPLATE FUNCTION swap
template<class _Ty,
    size_t _Size> inline
    void swap(_Ty (&_Left)[_Size], _Ty (&_Right)[_Size])
    {   // exchange arrays stored at _Left and _Right
    if (&_Left != &_Right)
        {   // worth swapping, swap ranges
        _Ty *_First1 = _Left;
        _Ty *_Last1 = _First1 + _Size;
        _Ty *_First2 = _Right;
        for (; _First1 != _Last1; ++_First1, ++_First2)
            _STD iter_swap(_First1, _First2);
        }
    }