[omniORB] WinNT TCP connection problems

Clarke Brunt clarke.brunt at yeomannavigation.co.uk
Wed Feb 25 14:18:21 GMT 2004


We're using omniORB 4.03 on Windows NT4. We've been seeing occasional CORBA
exceptions due to attempts to connect a TCP/IP socket failing, or else
appearing to connect to 255.255.255.255:65535 and then failing on trying to
send something. I already bothered Duncan a while ago with some traces for
this, and he first suggested that I change to the latest omniORB as the
logging was better. I now thought I'd try the mailing list to see whether
anyone else has seen anything similar.

A couple of days ago, I added OMNI exception handlers for TRANSIENT and
COMM_FAILURE, which merely return true (i.e. perform a retry) if n_retries
is less than 3.

So here are some (slightly annotated) log fragments (I have the full things
if anyone cares).

Application has been doing loads of stuff with no problems then:

omniORB: Client attempt to connect to
giop:tcp:database2.int.travelm8.com:2809
omniORB: throw giopStream::CommFailure from
giopStream.cc:1070(0,NO,TRANSIENT_ConnectFailed)

At this point my Exception Handler prints out (first number is current time
as time_t):
1077709039 TRANSIENT, retries 0, minor 1096024066, completed 1

and omniORB retries successfully:

omniORB: Client attempt to connect to
giop:tcp:database2.int.travelm8.com:2809
omniORB: Client opened connection to giop:tcp:172.19.50.24:2809
omniORB: sendChunk: to giop:tcp:172.19.50.24:2809 100 bytes
omniORB:
4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
etc.

Then less than a second later:

omniORB: Client attempt to connect to giop:tcp:172.19.50.24:2780
omniORB: Client opened connection to giop:tcp:255.255.255.255:65535
omniORB: sendChunk: to giop:tcp:255.255.255.255:65535 34 bytes
omniORB:
4749 4f50 0100 0103 1600 0000 0200 0000 GIOP............
0e00 0000 fe67 4e2b 4000 0002 4300 0003 .....gN+ at ...C...
7e58                                    ~X
omniORB: throw giopStream::CommFailure from
giopStream.cc:1103(0,NO,COMM_FAILURE_MarshalArguments)

At this point my Exception Handler prints out (first number is current time
as time_t):
1077709039 COMM_FAILURE, retries 0, minor 1096024067, completed 1

and omniORB retries successfully:

omniORB: Client connection refcount = 0
omniORB: Client close connection to giop:tcp:255.255.255.255:65535
omniORB: LocateRequest to remote: root<1484653312>
omniORB: Client attempt to connect to giop:tcp:172.19.50.24:2780
omniORB: Client opened connection to giop:tcp:172.19.50.24:2780
omniORB: sendChunk: to giop:tcp:172.19.50.24:2780 34 bytes
omniORB:
4749 4f50 0100 0103 1600 0000 0200 0000 GIOP............
etc.

And _still_ within the same second:

omniORB: Client attempt to connect to giop:tcp:172.19.50.28:2179
omniORB: Client opened connection to giop:tcp:255.255.255.255:65535
omniORB: sendChunk: to giop:tcp:255.255.255.255:65535 38 bytes
omniORB:
4749 4f50 0102 0103 1a00 0000 0200 0000 GIOP............
0000 0000 0e00 0000 fead 313b 4000 0001 ..........1;@...
6000 0000 0001                          `.....
omniORB: throw giopStream::CommFailure from
giopStream.cc:1103(0,NO,COMM_FAILURE_MarshalArguments)

At this point my Exception Handler prints out (first number is current time
as time_t):
1077709039 COMM_FAILURE, retries 0, minor 1096024067, completed 1

and omniORB retries successfully:

omniORB: Client connection refcount = 0
omniORB: Client close connection to giop:tcp:255.255.255.255:65535
omniORB: LocateRequest to remote: root<16777216>
omniORB: Client attempt to connect to giop:tcp:172.19.50.28:2179
omniORB: Client opened connection to giop:tcp:172.19.50.28:2179
omniORB: sendChunk: to giop:tcp:172.19.50.28:2179 38 bytes
omniORB:
4749 4f50 0102 0103 1a00 0000 0200 0000 GIOP............
etc.

And then things work perfectly for some length of time (e.g. an hour or two)
until similar problem(s) next happen.

So the first case is the connect() call failing, and the latter 2 believe
that they have connected to 255.255.255.255:65535 but then fail to send
(perhaps not surprisingly). In all cases, an immediate retry works. In fact
the third case was trying to make a connection to localhost, though using
its proper address (172.19.50.28:2179).

Whilst I suppose that this could be an omniORB problem, we have only
observed it on certain machines in our network, despite doing fairly similar
things on other machines. Could it be some kind of problem with WinNT's
TCP/IP implementation? Any comments welcome!

--
Clarke Brunt, Principal Software Engineer, Yeoman Navigation




More information about the omniORB-list mailing list