[omniORB] Bug in omniOrb thread handling?

Thu Jul 21 11:55:48 BST 2005

Hi Luke,

> On Wed, 2005-07-20 at 18:29 +0200, Thomas Richter wrote:
> > sorry to bring up this issue again, but I haven't received
> > a useful comment on this one here.
> 
> Did you see the email from Thomas Lockhart querying the large
> maxServerThreadPoolSize setting?  Your kernel/glibc may not be
> configured to allow creating so many threads, though I doubt your thread
> pool is growing that large from what you say in your email.

*Sigh* Please, read my post up to the end. There *are* only
eleven threads running. The point is that on a continuously
running system, omniorb starts threads one after another, and
also stops threads again, but the stopping of threads is incomplete
and zombies remain.

> > I've scanned thru the source, but found nowhere any "pthread_join"
> > that could have been called. Thus to my analysis, the zombies remain
> > around here because they are never joined back when they die away, and
> > at some point the process table just overruns. At that point, the
> > communication from clients to the server breaks down because no
> > working thread can be allocated for a new task at hand.
> 
> OmniORB doesn't use pthread calls directly; it uses the omnithread
> library which provides a cross platform C++ thread API.

Sorry, but - ain't it irrelevant how many layers of software are
wrapped around pthread? Fact is that pthread_join is never run,
or pthread_detach isn't run, and whether omnithread or omniorb misses 
to call it is beyond what I would ned to care about.

> In this case the omniAsyncWorker thread is started as "detached" (due to
> the call to start() rather than start_undetached() in its constructor).
> This results in a call to pthread_detach() when using pthreads, and so a
> pthread_join is not needed (nor allowed).

And where do the zombies come from? Concluding, either 
a) pthread_detach() isn't run or,
b) pthread_join() isn't run.

In either case: I *do* see the zombies, and the zombies break
my application because there are no more slots for new threads.
There are *never* more than a couple of hundred threads running
simulatenously, but dying threads are never cleaned up correctly
and mess up the process table.

> Since you are finding that these threads are not being cleaned up, then
> perhaps your system's pthread implementation is faulty?  

All possible; this is a Suse 9.0 system with a 2.4.21 kernel. 

> Are you interfering with signals / signal handling at all?

Nope.

So long,
	Thomas