[omniORB] Failure to connect to server/execute user functions
Mon, 15 Mar 99 16:12:24 EST
From: Sai-Lai Lo <S.Lo@uk.research.att.com>
Date: 12 Mar 1999 10:41:08 +0000
Remember that by default we have the scavengers scanning incoming and
outgoing connections every 30 seconds. They may close down connections when
they decided that the connections are idle. To eliminate this factor in
your debugging, turn off the scavengers by doing this:
If what you are doing is very computational intensive, it may be the case
that some server threads never got scheduled before the incoming scavenger
decide the connection is idle. If that is the case, it may worth putting in
a thread yield call in appropriate places to kick the NT scheduler into
The computational loads that I put on my machine was entirely another
process unrelated to the client/server.
This particular client itself is fairly simple, and only calls three
or four operations on the server, which return essentially
immediately, after doing very little work on the server side (creating
objects and associated bookkeeping). Unfortunately I was unable to
reproduce this behavior with another, much simpler, client/server
setup, so it could still be something completely stupid that we are
doing with our server.
Grasping at straws, I did in fact try this idleConnectionScanPeriod
change, but, if anything, that caused the problem to happen more
often. Generally speaking, the entire test does complete within 30
seconds, so it shouldn't be related.
I have now successfully built OmniOrb on my own machine, and have
removed some catch (...)'s, which fixed to the problem that I had with
the debugger -- now instead of catching the error, I am offered
just-in-time debugging, and can look at the errors in my program.
However, unfortunately, apparently whatever is going wrong with the
client connecting to the server in the heavy load situation is
unrelated to the catches that I removed; it's possible that I need to
find other catches to remove.
>>>>> Judy Anderson writes:
> We're having an intermittent connection problem which of course only
> occurs when we use our large application, and doesn't seem to occur
> when we try to reproduce it with a small test case.
> In particular, what's happening is that upon the client's attempt to
> get a handle on the server object, we will intermittently receive a
> communications failure. Originally this was happening about 1/3 of
> the time, but it seems to have receded back to 5% (worse, from a
> debugging standpoint). So, I have a batch file which launches 40
> clients with start/b, and one or two of them will fail to connect.
> Interestingly, if I crank the debug level on the server, I get
> messages that show me that all 40 clients succeeded in getting a
> thread started for them, but only 38 of them will get the printout
> from my user function. So, something is going wrong somewhere
> intermittently, between the receiving of the connection, and the
> calling of the user function. The clients are, of course, identical,
> and so there is no particular reason for one or another to fail. It
> is simply random.
> Likely, the communications failure is simply an indication that there
> was a problem -- as I have whined before, when there is a bug, I don't
> get into the debugger, but rather than thread handling the operation
> simply aborts or exits or something, and throws a communications
> failure to the client, and so it is difficult to tell exactly what
> might be wrong.
> You can imagine that I am somewhat frustrated by this bug. Does
> anybody have any clues?
> I know! We'll just tell our users, "be gentle with the server, it has
> a delicate constitution."
> Judy Anderson "yduJ"
Sai-Lai Lo S.Lo@uk.research.att.com
AT&T Laboratories Cambridge WWW: http://www.uk.research.att.com
24a Trumpington Street Tel: +44 223 343000
Cambridge CB2 1QA Fax: +44 223 313542