[omniORB] Omniorb threads dying without reason

Wed Sep 20 19:45:58 BST 2006

On Monday 18 September, "Nicusor Tanase" wrote:

[...]
> omniORB: (4) throw giopStream::CommFailure from
> giopStream.cc:834(0,NO,COMM_FAILURE_UnMarshalArguments)
> omniORB: (4) Server connection refcount = 1
> omniORB: (4) AsyncInvoker: thread id = 4 has exited. Total threads = 15
> 
> While checking the Omni code, I discovered that "giopStream.cc:834"
> line should never be reached under any conditions, at least the
> comments say so.  giopStream.cc:

[...]
>     834       errorOnReceive(rsz,__FILE__,__LINE__,buf,0);
>     835       // never reaches here.

You are misinterpreting the comment -- the comment means that
errorOnReceive throws an exception, so the flow of control never reaches
the following line. It doesn't mean that the call to errorOnReceive is
never reached.

Anyway, the exception is not a sign of a problem. It simply means that a
network connection was closed, so the server thread that was trying to
read from the connection saw a read failure. The exception is caught
higher up the code, and everything carries on fine.

> On other hand, in other cases, we get UnmarshallingException, but the
> threads continue to work normally.
> 
> omniORB: (18) Dispatching remote call 'SubscriptionQuery5' to: root/my
> poa<smServer> (active)
> omniORB: (18) sendChunk: to giop:tcp:10.230.169.177:56903 453 bytes
> omniORB: (18) giopWorker task execute.
> omniORB: (18) throw giopStream::CommFailure from
> giopStream.cc:834(0,NO,COMM_FAILURE_UnMarshalArguments)
> omniORB: (18) Server connection refcount = 1
> omniORB: (18) Server connection refcount = 0
> omniORB: (18) Server close connection from giop:tcp:10.230.169.177:56930
> omniORB: (18) giopWorker task execute.
> 
> The difference that I could notice is that for second case, worker
> thread is being used for another job. It might be that due to low
> load, the thread is being destroyed in the first case.

Yes, that's exactly what's happening. Threads are stopped if they're
idle for 10 seconds, so thread 4 didn't have anything to do and exited.

> Also, another difference, is that the reference count towards the
> server side object. In second case, this reaches zero, while in the
> first case it remains 1.

That just means that another thread was holding on to the connection at
the time the connection was closed. It's not in an of itself a sign of a
problem.

> Any idea what could cause the thread problem is welcome (including
> the explanation that there is no problem :) ).

Everything you've posted so far looks completely normal, and is not a
sign of any kind of problem. Whatever is causing the slowdown you're
seeing is something else, not these connection closures.

Cheers,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --