[omniORB] Deadlock on mutex lock in Rope_iterator while UnMarshalObjRef

Bjorn Jorde bjorde@tumbleweed.com
Tue, 13 Jul 1999 19:39:52 -0700


We're running into a situation where multiple threads are hanging
on a mutex in the Rope_iterator.

We're using omniORB2.6.1 on Solaris 2.6.

We're not sure which thread is causing this dead lock, but it happens
when calling a operation that is returning a object reference. The
call returns from the server, but it never returns from unmarshalling
the object reference.

It is reproducible, but only happens under heavy load in certain
configuration.

I'm including the stack trace of the call that is hanging (there are
several threads hanging at the same point). I'm also including the
stack trace of a thread that is locking on the same mutex.

Here's the stack of the locked thread (never returns):

 0xef5a7a94(0x2181c8, 0x8, 0x2, 0x218218, 0x218218, 0xef5ad800)
   func_native_pool_thread_main(0x0, 0x0, 0x6, 0xef3652b0, 0x8, 0x0)
  posta_main(pb = 0xaa898, sn = 0x296200, rq = 0x1f7c90)
   HttpGatewayAppI::Process(this = 0xc6938, cgip = 0xeb773bf4)
   TransportManagerC::Process(this = 0xc6cb8, transportp = CLASS)
   TransactionManagerC::Process(this = 0xc6c90, transport = CLASS)
   TransactionManagerC::HandleTransaction(this = 0xc6c90, txn = CLASS)
   _proxy_TW_SessionManager::LockSession(this = 0x123bb8, id =
1941569693, session_key = 0x295620
 "WEI628KBX7")
   TW_Session::unmarshalObjRef(s = CLASS)
   CORBA::UnMarshalObjRef(repoId = 0xece7bdf4 "IDL:TW_Session:1.0", s =
CLASS)
   omni::createObjRef(mostDerivedRepoId = 0xf4938
"IDL:TW_SessionCentral:1.0", targetRepoId =
 0xece7bdf4 "IDL:TW_Session:1.0", profiles = 0x222ab8, release = '\001')
   ropeFactory::iopProfilesToRope(profiles = 0x222ab8, objkey = 0x295458
 "7\x8b\xb8y\xb5\xe4-\x9a", keysize = 12U, rope = CLASS)
   tcpSocketMToutgoingFactory::findOrCreateOutgoing(this = 0xc56a8, addr
= 0x222b18)
   Rope_iterator::Rope_iterator(this = 0xeb7730b8, a = 0xc56a8)
   omni_mutex::lock(this = 0xc56a8)
   _mutex_lock(0xc56a8, 0xef3652b0, 0x2956b0, 0x50000000, 0x50495000,
0x0)
   _mutex_adaptive_lock(0xc56a8, 0x4c00, 0xef3652b0, 0x1, 0x4d58,
0xfffeffff)
   _park(0xeb773de0, 0xeb773e80, 0x0, 0xeb773e5c, 0xeb773e58,
0xeb773e54)
   _lwp_sema_wait(0xeb773e80, 0xc1, 0x0, 0x0, 0xff00, 0xff)


And here's another thread locking on the same mutex (see
omni_mutex::lock):


   omni_thread_wrapper(ptr = 0x29d090)
   tcpSocketWorker::run(this = 0x29d090, arg = 0x296018)
   omniORB::giopServerThreadWrapper::run(this = 0xb08e0, fn = 0xee46d5d0
=
 &tcpSocketWorker::_realRun(void*), arg = 0x296018)
   tcpSocketWorker::_realRun(arg = 0x296018)
   GIOP_S::dispatcher(s = 0x296018)
   GIOP_S::HandleRequest(this = 0xea30dae4, byteorder = '\0')
   _sk_TW_HttpGateway::dispatch(this = 0x120844, _0RL_s = CLASS, _0RL_op
= 0xea30db34
 "NotifyStatusDelta", _0RL_response_expected = '\001')
   _sk_TW_Component::dispatch(this = 0x120830, _0RL_s = CLASS, _0RL_op =
0xea30db34
 "NotifyStatusDelta", _0RL_response_expected = '\001')
   _CORBA_Unbounded_Sequence::operator <<=(this = 0xea30d4fc, s = CLASS)
   _CORBA_Sequence::operator <<=(this = 0xea30d4fc, s = CLASS)
   TW_ComponentStatusDelta::operator <<=(this = 0x296f18, _n = CLASS)
   _CORBA_ObjRef_Member::operator <<=(this = 0x296f18, s = CLASS)
   TW_Component_Helper::unmarshalObjRef(s = CLASS)
  TW_Component::unmarshalObjRef(s = CLASS)
   CORBA::UnMarshalObjRef(repoId = 0xece59a04 "IDL:TW_Component:1.0", s
= CLASS)
   omni::createObjRef(mostDerivedRepoId = 0x2b9968 "IDL:TW_Booter:1.0",
targetRepoId = 0xece59a04
 "IDL:TW_Component:1.0", profiles = 0x1ee3d0, release = '\001')
   ropeFactory::iopProfilesToRope(profiles = 0x1ee3d0, objkey = 0x1ee3b8
"7:", keysize = 12U,
 rope = CLASS)
   tcpSocketMToutgoingFactory::findOrCreateOutgoing(this = 0xc56a8, addr
= 0x2959c8)
   Rope_iterator::Rope_iterator(this = 0xea30cc70, a = 0xc56a8)
   omni_mutex::lock(this = 0xc56a8)
   _mutex_lock(0xc56a8, 0xef3652b0, 0x1ee400, 0x50000000, 0x50495000,
0x0)
   _mutex_adaptive_lock(0xc56a8, 0x4c00, 0xef3652b0, 0x1, 0x4d58,
0xfffeffff)
   _swtch(0xebb83d80, 0xea30dfe0, 0xea30de60, 0xea30de5c, 0xea30de58,
0xea30de54)


We actually have several threads in each of the above states.

Has anybody else run into a similar problem?

Any chance that the thread holding the lock returned without unlocking
the mutex?

Thanks,
Bjorn


-- 
Bjørn Jorde				bjorn@tumbleweed.com
Senior Software Engineer		(650)216-2028
Tumbleweed Communications Corp.		<http://www.tumbleweed.com>