[omniORB] orb shutdown hangs in giopServer::deactivate(): the solution

Renzo Tomaselli renzo.tomaselli at tecnotp.it
Mon Oct 20 10:23:23 BST 2003


Hi all,
    here is the solution to this blocking.
Things were a bit more complicated than expected: a connection survives
(thus blocking the ORB later on at shutdown time) if there is a late worker
thread running after the one which detects client shutdown (internal comm.
failure) AND such last thread is a temporary worker.
A temporary worker is created in the thread-per-connection model (the
default) when the dedicated thread is busy in a up-call. In the thread-pool
model, all workers are temporary threads.
This behavior is responsible of the apparently random appearance of this
blocking.
Anyway, a simple fix requires inserting next line of code at lines #940 and
#962 of giopServer.cc:

if (conn->pd_n_workers == 1 && conn->pd_dying) return 1;    // last worker
and conn. is already dead

this gives the current worker a chance to perform another attempt, where it
detects that the connection is gone (exit_on_error =1, giopWorker.cc:218).
The the connection is removed.
I kindly ask everyone who met this problem before to try this fix.
Thanks,

Renzo Tomaselli

----- Original Message -----
From: "Renzo Tomaselli" <renzo.tomaselli at tecnotp.it>
To: "Omniorb list" <omniorb-list at omniorb-support.com>
Sent: Friday, October 17, 2003 2:32 PM
Subject: Re: [omniORB] orb shutdown hangs in giopServer::deactivate()


> Hi Matej,
>     this pattern is similar to what I do. However I noticed that the
> blocking connection is owned by a server strand. The client strand seems
not
> relevant to the blocking.
> Now I can reproduce the problem by introducing some delay in servant
> destructors.
> Basically, my server holds two strands/connections, since the client deals
> with parallel threads. I cannot easily reproduce this problem with just
one
> connection, but nevertheless it appeared after long sessions within only
one
> connection context.
> Then my client holds some refs to server objects which export a "close"
> method to deactivate them.
> When a server worker gets such a request, we run into next sequence:
>
> - the worker dispatches the operation to the servant, which deactivate
> itself.
> - a reply is sent to the client, which si free to invoke further
operations,
> such as closing other server objects.
> - if lastInvokationHasCompleted, then object destructor is called. Since
> reply has been already sent back, this eventually occurs in parallel among
> several objects at the same time.
> - the client exits, so that connections to the server are shut down.
> - another server worker raises an internal comm. failure
> (inputRaiseCommFailure), which is caught by the involved dispatcher and
> forces some cleanup (in giopWorker::real_execute()), such as setting the
> strand to a DYING status.
>
> All troubles seem due to the parallelism among this worker and other
workers
> which are removing local identities, which may end up in setting the
> connection to a TIMEOUT status.
> When things go wrong, we finally endup with a connection which has no more
> workers, is dying, but it has pd_refcount = 1, which prevents it to be
> removed. Much like if the very last worker terminated without affecting
the
> connection ref. counter.
> I'm full time on this issue by the MSVC debugger, I'll keep the list
> informed.
> Bye,
>
> Renzo Tomaselli
>
> ----- Original Message -----
> From: "Matej Kenda" <matej.kenda at hermes.si>
> To: "Renzo Tomaselli" <renzo.tomaselli at tecnotp.it>
> Cc: "Omniorb list" <omniorb-list at omniorb-support.com>
> Sent: Friday, October 17, 2003 11:24 AM
> Subject: Re: [omniORB] orb shutdown hangs in giopServer::deactivate()
>
>
> We have mush simpler case to reproduce this problem:
>
> A servant A is created and its reference is passed as a parameter to
> another servant (B) that is running in another ORB (on the same or
> another host). Function B_var->Process(A_ptr) is called.
>
> The object reference (A_ptr) is used as a callback within the function
> call (B::Process(A_ptr)) on the callee.
>
> After Process() finishes, A is destroyed and the ORB containing A goes
> down.
> ...
>
>
> _______________________________________________
> omniORB-list mailing list
> omniORB-list at omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>




More information about the omniORB-list mailing list