[omniORB] Assertion failed

Steven W. Brenneis brennes1@rjrt.com
Fri, 30 Jul 1999 09:02:05 -0400


Sai-Lai Lo wrote:
> 
> >>>>> Dietmar May writes:
> 
> > I was going to send this as a separate message, but maybe it is related
> > to this thread??  Platform is NT 4.0 SP4 [has IE 4.01 SP1].
> 
> > I'm getting a COMM_FAILURE during a call to a local (but not colocated)
> > server. Usually this is an occasional transient failure (ie. it occurs
> > once, then on a retry the call succeeds, and happens once every 50 calls
> > or so). I've never been able to debug what happens because of its
> > occasional nature.
> 
> Unless I'm mistaken, you are seeing the effect of the scavengers at work.
> If a connection has not been used for a period of time, the scavenger
> shutdown the socket.
> 
> > However, today I ran into a problem with identical (application-level)
> > symptoms, and it was repeatable. Possibly this is a related (or
> > identical) problem.
> 
> > Basically, the call to socket(INETSOCKET,SOCK_STREAM,0) returns
> > RC_INVALID_SOCKET. A second local (but non-colocated) server continues to
> > accept omniORB calls. The server that the socket was communicating with
> > is alive, and seems to be operational (at least if I attach to the
> > process with the MSVC debugger).
> 
> Someone with more knowledge on NT's socket library might be able to answer
> this. Could it be that you are running into a resource limit?
> 

I have experienced the same error on NT 4.0 very infrequently. I added a
call to WSAGetLastError as suggested in the WinSock documentation and
the result was a very unsatisfying error code of 0.  I suspect a WinSock
bug.  I doubt it was a resource limit since WinSock provides error codes
to cover this case.  In any event, there is a theoretical limit of 65536
possible sockets where the actual limit will be determined by the amount
of virtual memory available and any arcana of which Microsoft has not
told us.

> > What would cause the socket to close while an application is running?
> 
> See answer above.
> 
> > Should omniORB be trying to open another socket?
> 
> Yes, it would in the case that the connection was shutdown by a scavenger.
> omniORB would try to connect again, if that fails it throws a COMM_FAILURE.
> If you want the ORB to try harder, see the chapter on setting up system
> exception handler in the user guide.
> 

On NT clients which have relatively short lifespans (say less than a
couple of hours), we have disabled the in- and outscavengers.  Our logic
in doing this was that since these clients are not present for long on
the machine, they are not likely to cause resource problems.  We saw a
noticeable although not exceptional performance improvement.  These
clients receive irregular but frequent updates from a database server
via callbacks.

On the server side, we set the scavenger periods to various values
ranging from 5 minutes to 24 hours.  We have had no real problems with
doing this.  Maybe someone has more information.  Remember that if a
rope grabs a strand with a dead socket, it will delete the strand and
create a new one.  This seems to be a performance trade-off with having
all the strands to a client deleted by the scavengers.  We used to get
frequent unexplained COMM_FAILURE's using the default (30 second)
scavenger period.  After changing the scavenger periods, the 
infrequent COMM_FAILURE's we get are directly traceable to clients who
have crashed or exited without cleaning up their callbacks.

Steve Brenneis

> --
> Sai-Lai Lo                                   S.Lo@uk.research.att.com
> AT&T Laboratories Cambridge           WWW:   http://www.uk.research.att.com
> 24a Trumpington Street                Tel:   +44 1223 343000
> Cambridge CB2 1QA                     Fax:   +44 1223 313542
> ENGLAND