[omniORB] Deadlock in omniORB 4.0.6 ?

Duncan Grisby duncan at grisby.org
Fri Oct 14 14:52:01 BST 2005


On Friday 14 October, "Wernke zur Borg" wrote:

> During my ongoing investigation of failure reports concerning an omniORB
> based application I have come across a process that has 99 threads waiting
> on giopStream::sleepOnRdLock(). I guess that some limit was reached, and
> that no more threads were dispatched. The omniORB version is 4.0.6.

omniORB limits each connection to 100 threads by default (the
maxServerThreadPerConnection parameter). Your stack traces show 100
threads trying to read from one connection (99 in sleepOnRdLock and 1 in
recv), so that explains why it stopped at that point.

> The application was again blocked in a sense that expected upcalls were not
> performed. Please note that this is a slowly running application with just a
> few upcalls every so many seconds - therefore I believe this situation must
> have accumulated over quite some time and I am suspecting a deadlock
> somewhere. I am pretty sure that the real origin of the problem lies in the
> application code, however I cannot see any upcall that would be blocking in
> application code.

I don't think it's a direct application code issue. Of the 100 threads
handling the connection, one is trying to read a request from the
connection, so it hasn't reached the application code yet. It could
conceivably be memory corruption by the application (or omniORB)
confusing omniORB's thread dispatch.

> Unfortunately I do not have a core file but only a pstack printout, which is
> attached to this posting. Please have a look - any hint will be appreciated
> as to what could be the reason for this situation.

Are you able to get a set of stack traces with the C++ names demangled?
That would make it much easier to read what's going on.

Is this a repeatable problem?  If so, please try running your
application, and getting several stack traces over time. That will show
if the blocked threads are accumulating over time, or whether they're
suddenly appearing all at once.

Either way, I don't know how it can be happening. omniORB is dispatching
multiple threads to handle a connection while one is still reading from
it. What's meant to happen is that one thread is dedicated to handling
the connection (since I believe you are using the thread per connection
policy). Only when that thread is busy in an upcall should other threads
be dispatched to handle extra incoming calls on the connection. Assuming
your clients don't send interleaved calls (which omniORB clients don't
by default) you could try setting maxServerThreadPerConnection to 1 to
prevent extra threads being dispatched. That ought to avoid the problem,
but doesn't explain why it's happening in the first place.

The only way to diagnose it further will be to run with -ORBtraceLevel
25 -ORBtraceThreadId 1 so we can see what's going on.

Cheers,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --



More information about the omniORB-list mailing list