[omniORB] GIOP protocol error.

Wed Jun 11 13:31:39 BST 2003

Hi,

I'm using omniORB 4.0.1 on Win2000. Recently, I have noticed that intermittently client processes are printing out the following message while doing a CosNaming::NamingContext::_narrow() operation on a CORBA::Object that references the name service:

omniORB: To endpoint: giop:tcp:127.0.0.1:3163. Send GIOP 1.0 MessageError because a protocol error has been detected. Connection is closed.

After some debugging, I have found that the client sends a message (representing the _narrow() method) to the name service which is in GIOP 1.0 format. The omniNames trace shows the receipt of this message and a subsequent response, which is also in GIOP 1.0 format. However, the client trace shows that the received message is in GIOP 1.2 format and indicates a "close connection" message rather than "response" message. The protocol error is picked up in src/lib/omniORB/orbcore/giopImpl10.cc:218 (function giopImpl10::inputMessageBegin()).

Here are the traces:

Client trace:
=============

omniORB: sendChunk: to giop:tcp:127.0.0.1:3163 100 bytes
omniORB:
4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
5c00 0000 01cd cdcd 0b00 0000 4e61 6d65 \...........Name
5365 7276 6963 65cd 0600 0000 5f69 735f Service....._is_
6100 6500 0000 0000 2800 0000 4944 4c3a a.e.....(...IDL:
6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
312e 3000                               1.0.
omniORB: inputMessage: from giop:tcp:127.0.0.1:3163 12 bytes
omniORB:
4749 4f50 0102 0105 0000 0000           GIOP........
omniORB: To endpoint: giop:tcp:127.0.0.1:3163. Send GIOP 1.0 MessageError because a protocol error has been detected. Connection is closed.

omniNames trace:
================

omniORB: inputMessage: from giop:tcp:127.0.0.1:1462 100 bytes
omniORB:
4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
ea07 0000 01cd cdcd 0b00 0000 4e61 6d65 ............Name
5365 7276 6963 65cd 0600 0000 5f69 735f Service....._is_
6100 6500 0000 0000 2800 0000 4944 4c3a a.e.....(...IDL:
6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
312e 3000                               1.0.
omniORB: sendChunk: to giop:tcp:127.0.0.1:1462 25 bytes
omniORB:
4749 4f50 0100 0101 0d00 0000 0000 0000 GIOP............
ea07 0000 0000 0000 01                  .........

Debugging this has not been easy as the problem is intermittent. I'm wondering (and this is a complete guess) whether this problem could be caused by a timeout occurring at the client endpoint, and then the client generating a close connection message as a result (to pass back up through its own stack), but generating the wrong GIOP version message? My gut feeling is that this probably isn't the case because omniNames was running, the machine wasn't under any great load, so the message shouldn't have timed out. But I haven't had any better ideas as to what would cause this spurious message to be received.

Any ideas?

Thanks,
Daniel. 

- Daniel Bell
- Software Engineer, Colorbus Pty Ltd
- Email: daniel.bell at colorbus.com.au
- Phone: 61 3 8574 8035
- WWW:   http://www.colorbus.com