[omniORB] Bug in omniOrb thread handling?

Thomas Richter thor at math.TU-Berlin.DE
Wed Jul 20 19:29:01 BST 2005


Hi folks,

sorry to bring up this issue again, but I haven't received
a useful comment on this one here.

/* snip */

Hi Duncan, hi mailing list,

is there a serious problem with the omniOrb thread management?
I'm using here version 4.0.6 on a Suse 9.1 Linux system 

thor at mersenne:~> uname -a
Linux mersenne 2.4.21-273-smp4G #1 SMP Mon Jan 17 13:19:07 UTC 2005 i686 i686 i386 GNU/Linux

The omniORB is running as a server with the following options set:

maxServerThreadPoolSize	= 4096
threadPerConnectionPolicy = 0

The problem is that, if I run the server over an extended period of
time in which clients, once in a while, use methods of the server, the
server dies away because it is no longer able to create threads.

I'm currently debugging this problem, and, according to my current
effort, omniAsyncInvoker::pd_nthreads is ten, whereas
omniAsyncInvoker::pd_totalthreads is 11, so the thread pool per se
is fine. If the problem appears, the following call in posix.cc, line
584 errors:

   THROW_ERRORS(pthread_create(&posix_thread, attr, omni_thread_wrapper,
				(void*)this));

In other words, the kernel is longer able to built new threads for me.
Debugging this further shows that "ps -ef" on that machine reveals:

thor at mersenne:~> ps -ef | grep "defunct" | wc
   1005    9045   63318

Thus, omniOrb built up an enourmous amount of "zombies", and the 
kernel seems no longer be able to fetch a suitable process Id.
And, I really verified these are zombie-threads from the mentioned
application.

I've scanned thru the source, but found nowhere any "pthread_join"
that could have been called. Thus to my analysis, the zombies remain
around here because they are never joined back when they die away, and
at some point the process table just overruns. At that point, the
communication from clients to the server breaks down because no
working thread can be allocated for a new task at hand.

Any fix, any comments?

Greetings,
	Thomas

/* snip */

To mention this once again, at the time the server breaks down,
the omniorb core recognizes only eleven running threads, but
thousands of zombies are lying around waiting to be cleaned up.

Why doesn`t that happen?

So long,
	Thomas





More information about the omniORB-list mailing list