[omniORB] Assertion failed

Dietmar May dcmay@object-workshops.com
Thu, 29 Jul 1999 18:13:00 -0400


I was going to send this as a separate message, but maybe it is related =
to this thread??

Platform is NT 4.0 SP4 [has IE 4.01 SP1].

I'm getting a COMM_FAILURE during a call to a local (but not colocated) =
server. Usually this is an occasional transient failure (ie. it occurs =
once, then on a retry the call succeeds, and happens once every 50 calls =
or so). I've never been able to debug what happens because of its =
occasional nature.

However, today I ran into a problem with identical (application-level) =
symptoms, and it was repeatable. Possibly this is a related (or =
identical) problem.

Basically, the call to socket(INETSOCKET,SOCK_STREAM,0) returns =
RC_INVALID_SOCKET. A second local (but non-colocated) server continues =
to accept omniORB calls. The server that the socket was communicating =
with is alive, and seems to be operational (at least if I attach to the =
process with the MSVC debugger).

What would cause the socket to close while an application is running? =
Should omniORB be trying to open another socket?

Dietmar

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

in the low-level socket code

// tcpSocketMTfactory.cc

static
tcpSocketHandle_t
realConnect(tcpSocketEndpoint* r)
{
  ...
  if ((sock =3D socket(INETSOCKET,SOCK_STREAM,0)) =3D=3D =
RC_INVALID_SOCKET) {
    return RC_INVALID_SOCKET;
  }
  ...
}

being called by:

void
tcpSocketStrand::ll_send(void* buf,size_t sz)=20
{
  ...
    if ((pd_socket =3D realConnect(pd_delay_connect)) =3D=3D =
RC_INVALID_SOCKET) {
      _setStrandIsDying();
      throw CORBA::COMM_FAILURE(errno,CORBA::COMPLETED_NO);
    }
  ...
}

> Helmut,
>=20
> The trace message you are seeing tells you that the server thread
> (tcpSocketMT Worker) detects that the socket it is blocking=20
> on has been
> closed by the remote end. I think this is because your client=20
> process exits
> and so the socket was closed. Seeing the socket closedown, tcpSocketMT
> Worker throws a COMM_FAILURE exception which is caught in the=20
> outer most
> try loop of the thread which causes the thread to exit and=20
> the dtor on the
> strand called.
>=20
> This is all normal behaviour.
>=20
> I suggest you upgrade to egcs-1.1.2 (remember the=20
> --enable-threads option)
> and try again. It is one variable that I would like to remove.
>=20
> If the problem still persists, you really have to do some=20
> digging in your
> code to try to isolate a test case for me to work on.=20
> (Sometimes, it may
> just be a stack overflow that is causing the problem. You=20
> happen to define
> a huge array as an IDL operation argument for instance.)
>=20
> Sai-Lai
>=20
> >>>>> Helmut Swaczinna writes:
>=20
> > I've disabled the scavengers, but the problem still remains. It is=20
> > always the same: The server process crashes *after* a=20
> method invocation,
> > not during method execution. The client gets back the=20
> result. The last trace=20
> > messages of the crashed process are always the same:
>=20
> > #### Communication failure. Connection closed.
> > tcpSocketMT Worker thread: exits.
> > tcpSocketStrand::~Strand() close socket no. 25
>=20
> > I'm not sure, but it seems to me, that the crash occurs, when the=20
> > client-thread, who has called the server's method,=20
> terminates. The clients
> > are multi-threaded.
>=20
> > What can I do wrong in my code, to produce such behaviour?
>=20
> > I haven't tried egcs 1.1.2 yet.=20
>=20
>=20
>=20
>=20
>=20
>=20
> --=20
> Sai-Lai Lo                                   S.Lo@uk.research.att.com
> AT&T Laboratories Cambridge           WWW:  =20
http://www.uk.research.att.com=20
24a Trumpington Street                Tel:   +44 1223 343000
Cambridge CB2 1QA                     Fax:   +44 1223 313542
ENGLAND