I'm debugging an "Access violation" exception on a large application in C++ (Visual Studio 2015). The application is built from several libraries and the problem occurs on one of them (SystemC), although I suspect the source of the problem is elsewhere.
What I see is a function-call that corrupts the address of a member function of the caller.
m_update_phase = true;
m_prim_channel_registry->perform_update();
m_update_phase = false;
inline
void
sc_prim_channel_registry::perform_update()
{
    for( int i = m_update_last; i >= 0; -- i ) {
    m_update_array[i]->perform_update();
    }
    m_update_last = -1;
}
(These are excerpts from systemc\kernel\sc_simcontext.cpp and systemc\communication\sc_prim_channel.h, part of the SystemC library)
The error happens after several iterations through this code above. The call to m_prim_channel_registry->perform_update() throws 0xC0000005: Access violation writing location 0x0F4CD9E9. exception.
This happens only when building the application in Release configuration.
Looking at the assembly code, I see that that the function sc_prim_channel_registry::perform_update() was inlined, and the inner function call m_update_array[i]->perform_update() seems to corrupt the stack frame of the calling function.
When the m_update_last = -1; is executed, &m_update_last no longer points to a valid memory location and the exception is thrown.
(m_update_last is a simple native member of class sc_prim_channel_registry with type int)
    m_update_phase = true;
    m_prim_channel_registry->perform_update();
1034D99E  mov         eax,dword ptr [esi+10h]  
1034D9A1  mov         byte ptr [esi+0A3h],1  
1034D9A8  mov         dword ptr [ebp-18h],eax  
1034D9AB  mov         ebx,dword ptr [eax+28h]  
1034D9AE  test        ebx,ebx  
1034D9B0  js          $LN163+0FEh (1034D9D0h)  
1034D9B2  mov         esi,eax  
1034D9B4  mov         eax,dword ptr [esi+20h]  
1034D9B7  mov         edi,dword ptr [eax+ebx*4]  
1034D9BA  mov         ecx,edi  
1034D9BC  mov         eax,dword ptr [edi]  
1034D9BE  call        dword ptr [eax+14h]  
1034D9C1  sub         ebx,1  
1034D9C4  mov         byte ptr [edi+1Ch],0  
1034D9C8  jns         $LN163+0E2h (1034D9B4h)  
1034D9CA  mov         esi,dword ptr [this]  
1034D9CD  mov         eax,dword ptr [ebp-18h]  
1034D9D0  mov         dword ptr [eax+28h],0FFFFFFFFh  
    m_update_phase = false;
The exception is thrown at address 1034D9D0
So the last instructions being executed are
0F97D9CD  mov         eax,dword ptr [ebp-18h]  
0F97D9D0  mov         dword ptr [eax+28h],0FFFFFFFFh  
m_prim_channel_registry address is in [ebp-18h] and eax, and [eax+28h] is m_update_last.
Looking in the watch window at esp and ebp before the inner call perform_update(), I see that:
    ebp-18h 0x0022fd5c  unsigned int
    esp 0x0022fd60  unsigned int
This is strange. The difference between them is only 4 bytes and the next push to the stack will make them equal and overwrite [ebp-18h]!
[ebp-18h] holds a copy of this->m_prim_channel_registry. The call 1034D9BE  call        dword ptr [eax+14h], as it pushes the stack, corrupts the contents of ebp-18h. It looks like the stack has grown (downwards) too much, and corrupts the previous frame.
My questions are:
- Am I analyzing the issue correctly? Did I miss something here?
- What could cause such a corruption? I assume the issue is not related to either the compiler or the SystemC library, probably something that happened earlier someplace else.
- What are the techniques for debugging such a corruption?
Update
I believe I found the problem, but I can't say I understand this completely.
In the same function (sc_simcontext::crunch) where the external perform_update() is invoked, systemc methods are invoked:
    // execute method processes
    sc_method_handle method_h = pop_runnable_method();
    while( method_h != 0 ) {
    try {
        method_h->execute();
    }
    catch( const sc_exception& ex ) {
        cout << "\n" << ex.what() << endl;
        m_error = true;
        return;
    }
    method_h = pop_runnable_method();
    }
These methods are deferred function calls registered earlier.
One of these methods was returning by executing ret 4 thus shrinking the stack frame every time it was called, to the point where the corruption described above happened.
And how did I manage registering a corrupted systemc method?
Apparently it's a bad idea using SC_METHOD(f) when f is a virtual function of the module. Doing that caused a different, unrelated "random" function to be called.
I'm not exactly sure why it happens this way and why this limitation exists. Also I don't remember seeing any warning about using virtual member functions as systemc methods, however it was definitely the problem. When debugging the method registration in the SC_METHOD call itself I could see the function pointer inside pointing to a different function than was given to the SC_METHOD macro.
To fix the problem I called SC_METHOD(wrapper_f), where wrapper_f is a simple non virtual member function of the module, that all it does is calling f, the original virtual function. That's it.
 
     
    