[omniORB] Re: Access violation in giopRendezvouser::execute() after ORB->destroy()

Dietmar May dcmay at dmis.com
Mon Mar 20 18:23:46 GMT 2006


This problem appears to be triggered by debugging (NOT caused by it - 
rather, frequently and almost repeatably triggered by it - that is, by 
setting breakpoints in application code related to ORB and POA shutdown).

The giopServer::deactivate() method is being called:

omni::giopServer::deactivate()
omni::giopServer::stop()
omni::omniObjAdapter::adapterInactive()
omni::omniOrbPOA::do_destroy()
omni::omniOrbPOA::destroy()
omni::omniOrbPOA::shutdown()
omniOrbORB::actual_shutdown()
 ...

In this method, maybe halfway through, is code to terminate the 
redezvousers and perform a timedwait() on a condition variable.

for(; p != &pd_rendevousers; p = p->next) {
  ((giopRendevouser*)p)->terminate();
}
omni_thread::get_time(s,ns,timeout);
while( ... )
  go = pd_cond.timedwait(s,ns);

It appears that the timeout is getting exceeded because the debugger is 
stopped; therefore, the code goes on to perform other shutdown 
operations for the giopServer. Presumably, somewhere after this 
finishes, the giopServer gets deleted. Meanwhile, the rendevouser is 
still chunking away merrily, blissfully unaware that its server has been 
deleted out from under it.

I've removed my breakpoints, and everything seems to be OK now. So, 
perhaps most applications - maybe even none, if all goes well - would 
ever crash outside of a debugger because of this. However, it seems like 
the code could handle this case more cleanly: at the least, perhaps 
nulling the pd_server pointer for each rendevouser before it gets 
snuffed (and of course, testing for null :-). Someone having greater 
familiarity with the code could probably suggest a more elegant solution.

Dietmar May

> While debugging this and tracing through the calls (in disassembly 
> mode), I noticed that literally during the call to 
> giopServer::notifyRzDone() - that is, while values are being pushed 
> onto the stack - the pd_server object gets deleted.
>
> Looking at other threads active at the time, I couldn't help noticing 
> one calling "operator delete". The call stack for this thread is 
> interesting:
>
> operator delete()
> omniObjTableEntry::`scalar_deleting_destructor`
> omniObjTableEntry::loseRef()
> omniObjRef::_setIdentity()
> omniObjRef::_disable()
> omniObjRef::_shutdown()
> omniOrbORB::actual_shutdown()
> omniOrbORB::do_shutdown()
> omniOrbORB::shutdown()
> // ... app code calls ORB->shutdown(1) ...
>
> It appears that at the time that the giopServant is deleted, 
> omniObjRef::_shutdown() is iterating a list of object references and 
> deleting the object references stored therein.
>
> omniObjRef::_shutdown() has a couple of mutex locks 
> (omni_traced_mutex_lock objects), but there don't appear to be any 
> locks active in the giopRendezvouser::execute() thread at the time it 
> dereferences its now deleted pd_server object.
>
>> CORBA::ORB->shutdown(1);
>>  omni_thread::join();  //wait for ORB->run() thread to end
>>  CORBA::ORB->destroy();
>>
>> The first two complete successfully. During the call to destroy(), 
>> the application pauses for a few seconds, then an access violation 
>> occurs.
>>
>> This occurs during a call to giopRendezvouser::execute(), and is due 
>> to omniORB's use of an already deleted pd_server object, by calling 
>> the pd_server->notifyRzDone() method (which is the last line in the 
>> execute() function). The pointer values looks like it was valid at 
>> one time (and it was used successfully in the execute() function 
>> earlier without crashing); but at the time of the access violation, 
>> the vtable is (already re)set to garbage, as are the other members of 
>> the pd_server object.
>>
>> The call stack for the ORB::destroy() thread is:
>>
>> omniAsyncInvoker::~omniAsyncInvoker()
>> omni::ORBAsyncInvoker::~ORBAsyncInvoker()
>> omni::ORBAsyncInvoker::`scalar deleting destructor`
>> omni::omni_corbaOrb_initialiser::detach()
>> omniOrbORB::destroy()
>> // ... application code calling ORB->destroy () ...
>>
>> The call stack for the crashing giopRendezvouser::execute() method is:
>>
>> omni_mutex_lock::omni_mutex_lock() //mutex 'this' is garbage
>> omni::giopServer::notifyRzDone()   //giopServer 'this' is garbage
>> omni::giopRendezvouser::execute()  //everything in 'this' looks OK 
>> except pd_server
>> omniAsyncWorker::real_run()
>> omniAsyncWorkerInfo::run()
>> omniAsyncWorker::run()
>> omni_thread_wrapper()



More information about the omniORB-list mailing list