[omniORB] problem with bidirectional feature - BUG OR FEATURE

Fernando A. de Araujo Filho maverick@elogica.com.br
Fri Dec 6 17:40:02 2002


Hi again,

I continue with this problem. Sorry for the noise and bad english ...
I have executed TRACES to try to discover
the problem/bug or expected behaviour.
I read carefully the paper CORBAControls2002.pdf writed by Duncan Grisby.
As I am not know enough to deep understanding of the core of OmniOrb
implementation,
I sent the trace below.

Basically, in a bidirectional "conversation", when the server is dying, it
send to client
a GIOP::CloseConnection message. This message allways raise a COMMFAILURE
exception.
If g->pd_strand->biDir is TRUE nothing is done.
When the server restart, and the client try to call the server, allways the
client get a valid RopeLink
with its "giopStrand" in giopStrand::DYING state. In this case, a timeout==0
occurs and
a TRANSIENT_CallTimedout exception is raised.
>From that point, the RopeLink allways get the same giopStrand in dying state
raise
and a TRANSIENT_CallTimedout exception is raised and.
we never call the server again.

What I cannot understand is:
If I dont apply ANY bidir feature, that problem not occurs. The server can
die and restart
without any trouble. The client allways call the server again.

The question is :

that is a bug or feature in BIDIR mode ?

The TRACE is below :

************************************
FIRST TIME CALLING THE SERVER

WE ARE IN

 IOP_C_Holder::IOP_C_Holder(const omniIOR* ior,
      const CORBA::Octet* key,
      CORBA::ULong keysize,
      Rope* rope,
      omniCallDescriptor* calldesc) : pd_rope(rope) {

  OMNIORB_ASSERT(calldesc);
  pd_iop_c = rope->acquireClient(ior,key,keysize,calldesc);
}

WE HAVE CALLED
IOP_C*
BiDirClientRope::acquireClient(const omniIOR* ior,
          const CORBA::Octet* key,
          CORBA::ULong keysize,
          omniCallDescriptor* calldesc) {

  GIOP_C* giop_c = (GIOP_C*)
giopRope::acquireClient(ior,key,keysize,calldesc);
...

WE TRY TO AQUIRE THE CLIENT

IOP_C*
giopRope::acquireClient(const omniIOR* ior,
   const CORBA::Octet* key,
   CORBA::ULong keysize,
   omniCallDescriptor* calldesc) {
...
 // DONT EXISTS ANY RopeLink FOR THE FIRST TIME
  RopeLink* p = pd_strands.next;
 ...
  // Reach here if we haven't got a strand to grab a GIOP_C.
  if ((nbusy + ndying) < max) {
    // Create a new strand.
...
    giopStrand* s = new
giopStrand(pd_addresses[pd_addresses_order[pd_address_in_use]]);
    s->state(giopStrand::ACTIVE);
    s->RopeLink::insert(pd_strands);
    s->StrandList::insert(giopStrand::active);
    s->version = v;
    s->giopImpl = impl;
  }

goto again:
...

// NOW WE HAVE THE FIRST RopeLink
// GET THE giopStrand REFERENCE
giopStrand* s = (giopStrand*)p;
STATE is ACTIVE
if (!giopStreamList::is_empty(s->clients)) {
...
else {
THERE ARE NO CLIENTS ON giopStreamList
CREATE A new GIOP_C with GIOP 1.2 impl
    g = new GIOP_C(this,s);
    ...
}
...
OK WE HAVE CONTACTED THE SERVER
EVERYTHING IS OK WHILE THE SERVER IS ALIVE, NO PROBLEM
*******************************************************

NOW OUR SERVER DIES
THE CLIENT ON giopImpl12::unmarshalWildCardRequestHeader
RECEIVE A GIOP::CloseConnection:
void
giopImpl12::unmarshalWildCardRequestHeader(giopStream* g) {
...
  case GIOP::CloseConnection:
    if (g->pd_strand->biDir) {
//g->pd_strand->biDir is TRUE BUT NOTHING IS DONE
      // proper shutdown of a connection.
      // XXX what to do?
    }
// CALL inputRaiseCommFailure
inputRaiseCommFailure(g);
}

void
giopImpl12::inputRaiseCommFailure(giopStream* g) {
  CORBA::ULong minor;
  CORBA::Boolean retry;
  g->notifyCommFailure(0,minor,retry);
  g->pd_strand->state(giopStrand::DYING);
  giopStream::CommFailure::_raise(minor,
      (CORBA::CompletionStatus)g->completion(),
      0,__FILE__,__LINE__);
}

OK A CommFailure EXCEPTION IS RAISED
***********************************************************

OUR SERVER IS RESTARTED
WE TRY TO CALL IT AGAIN

CALL STACK
omni::IOP_C_Holder::IOP_C_Holder(const omniIOR * 0x016a1b30, const unsigned
char * 0x016a1b88, unsigned long 21, omni::Rope * 0x0169e648,
omniCallDescriptor * 0x047efaa0) line 69
omniRemoteIdentity::locateRequest(omniCallDescriptor & {...}) line 257 + 44
bytes
omniObjRef::_locateRequest() line 1049
omniObjRef::_assertExistsAndTypeVerified() line 395
omniObjRef::_invoke(omniCallDescriptor & {...}, unsigned char 1) line 732
DVRSafenetIdls::_objref_DVRSafenetEstacao::centralChecaEstacaoAtiva() line
2738

WE ARE IN
////////////////////////////////////////////////////////////////////////////
IOP_C_Holder::IOP_C_Holder(const omniIOR* ior,
      const CORBA::Octet* key,
      CORBA::ULong keysize,
      Rope* rope,
      omniCallDescriptor* calldesc) : pd_rope(rope) {

  OMNIORB_ASSERT(calldesc);
  pd_iop_c = rope->acquireClient(ior,key,keysize,calldesc);
}

CALL STACK
omni::BiDirClientRope::acquireClient(const omniIOR * 0x016a1b30, const
unsigned char * 0x016a1b88, unsigned long 21, omniCallDescriptor *
0x047efaa0) line 483
omni::IOP_C_Holder::IOP_C_Holder(const omniIOR * 0x016a1b30, const unsigned
char * 0x016a1b88, unsigned long 21, omni::Rope * 0x0169e648,
omniCallDescriptor * 0x047efaa0) line 69 + 27 bytes
omniRemoteIdentity::locateRequest(omniCallDescriptor & {...}) line 257 + 44
bytes
omniObjRef::_locateRequest() line 1049
omniObjRef::_assertExistsAndTypeVerified() line 395
omniObjRef::_invoke(omniCallDescriptor & {...}, unsigned char 1) line 732

NOW WE WILL CALL giopRope::acquireClient(ior,key,keysize,calldesc);
IOP_C*
BiDirClientRope::acquireClient(const omniIOR* ior,
          const CORBA::Octet* key,
          CORBA::ULong keysize,
          omniCallDescriptor* calldesc) {

  GIOP_C* giop_c = (GIOP_C*)
giopRope::acquireClient(ior,key,keysize,calldesc);
...
}

IN giopRope::acquireClient

IOP_C*
giopRope::acquireClient(const omniIOR* ior,
   const CORBA::Octet* key,
   CORBA::ULong keysize,
   omniCallDescriptor* calldesc) {
...
  RopeLink* p = pd_strands.next;
  for (; p != &pd_strands; p = p->next) {
    giopStrand* s = (giopStrand*)p;
    switch (s->state()) {
    case giopStrand::DYING:
      {
WE GET A ROPELINK WITH A "giopStrand* s" STILL IN DYING STATE
 ndying++;
 break;
      }
...
AS   if (pd_oneCallPerConnection || ndying >= max) {
    // Wait for a strand to be unused.
    pd_nwaiting++;
    unsigned long deadline_secs,deadline_nanosecs;
    calldesc->getDeadline(deadline_secs,deadline_nanosecs);
    if (deadline_secs || deadline_nanosecs) {
THE pd_cond.timedwait call return 0
      if (pd_cond.timedwait(deadline_secs,deadline_nanosecs) == 0) {
 pd_nwaiting--;
THROW A TRANSIENT_CallTimedout EXCEPTION
 OMNIORB_THROW(TRANSIENT,TRANSIENT_CallTimedout,CORBA::COMPLETED_NO);
...
}

FROM THAT POINT, THE ROPELINK ALLWAYS GET THE SAME STRAND IN DYING STATE
WE NEVER CALL THE SERVER AGAIN ...

THE PROBLEM IS:
IF WE DONT APPLY ANY BIDIR FEATURE EVERYTHING WORKS FINE
BUT IF WE HAVE A pd_strand->biDir THE PROBLEM ALLWAYS OCCURS AFTER
THE SERVER HAVE RESTARTED AND WHEN WE TRY TO CALL IT

some help ?

Fernando A. de Araujo Filho
maverick@elogica.com.br