[omniORB] GIOP protocol error.

Daniel Bell Daniel.Bell at colorbus.com.au
Thu Jul 17 19:34:37 BST 2003


Hi,

Here is a follow-up to my original message about a GIOP protocol error that I was seeing. The original message is at the bottom and explains what I was seeing. 

I have been able to reproduce this error using a single client and omniNames (omniORB 4.0.1 on win2k). Basically what happens is that the client queries omniNames successfully and then waits for a period of greater than 3 minutes before querying omniNames again. On the repeat query to omniNames, the client receives a GIOP protocol error (the client expects a GIOP 1.0 response but instead gets a GIOP 1.2 close connection). 

Due to some misconfiguration on my behalf, I had the client configured to NOT do any idle connection scanning and omniNames to do the default idle connection scanning. (I have now fixed this configuration, but I believe that this is still a valid issue.) Three minutes after the client has queried omniNames, omniNames decides that the connection is idle and initiates shutdown procedures on that connection. When omniNames sends the close connection message to the client, the client doesn't display log information to indicate that it has received the close connection message. omniNames then closes its end of the socket. When the client next queries omniNames, it tries to send its message on the original socket (which has been closed by omniNames). The client message never gets to omniNames but the client believes that it has sent it successfully. The client then tries to read the response message from the socket and reads the GIOP 1.2 close connection message, causing the GIOP protocol error (the query message from the client was sent using GIOP 1.0).

It seems to me that the client should work out when the close connection message arrives that the connection has been closed, and attempt its second query to omniNames on a new connection. Or is this why the client idle connection period defaults to a value shorter than the server idle connection period? (so that idle connections are always shutdown from the client end, rather than the server end.)

Is this a bug, my misunderstanding, or some mis-configuration?

Thanks,
Daniel.

> -----Original Message-----
> From: Daniel Bell 
> Sent: Wednesday, 11 June 2003 12:32 PM
> To: omniorb-list at omniorb-support.com
> Subject: [omniORB] GIOP protocol error.
> 
> 
> Hi,
> 
> I'm using omniORB 4.0.1 on Win2000. Recently, I have noticed 
> that intermittently client processes are printing out the 
> following message while doing a 
> CosNaming::NamingContext::_narrow() operation on a 
> CORBA::Object that references the name service:
> 
> omniORB: To endpoint: giop:tcp:127.0.0.1:3163. Send GIOP 1.0 
> MessageError because a protocol error has been detected. 
> Connection is closed.
> 
> After some debugging, I have found that the client sends a 
> message (representing the _narrow() method) to the name 
> service which is in GIOP 1.0 format. The omniNames trace 
> shows the receipt of this message and a subsequent response, 
> which is also in GIOP 1.0 format. However, the client trace 
> shows that the received message is in GIOP 1.2 format and 
> indicates a "close connection" message rather than "response" 
> message. The protocol error is picked up in 
> src/lib/omniORB/orbcore/giopImpl10.cc:218 (function 
> giopImpl10::inputMessageBegin()).
> 
> Here are the traces:
> 
> Client trace:
> =============
> 
> omniORB: sendChunk: to giop:tcp:127.0.0.1:3163 100 bytes
> omniORB:
> 4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
> 5c00 0000 01cd cdcd 0b00 0000 4e61 6d65 \...........Name
> 5365 7276 6963 65cd 0600 0000 5f69 735f Service....._is_
> 6100 6500 0000 0000 2800 0000 4944 4c3a a.e.....(...IDL:
> 6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
> 672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
> 312e 3000                               1.0.
> omniORB: inputMessage: from giop:tcp:127.0.0.1:3163 12 bytes
> omniORB:
> 4749 4f50 0102 0105 0000 0000           GIOP........
> omniORB: To endpoint: giop:tcp:127.0.0.1:3163. Send GIOP 1.0 
> MessageError because a protocol error has been detected. 
> Connection is closed.
> 
> omniNames trace:
> ================
> 
> omniORB: inputMessage: from giop:tcp:127.0.0.1:1462 100 bytes
> omniORB:
> 4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
> ea07 0000 01cd cdcd 0b00 0000 4e61 6d65 ............Name
> 5365 7276 6963 65cd 0600 0000 5f69 735f Service....._is_
> 6100 6500 0000 0000 2800 0000 4944 4c3a a.e.....(...IDL:
> 6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
> 672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
> 312e 3000                               1.0.
> omniORB: sendChunk: to giop:tcp:127.0.0.1:1462 25 bytes
> omniORB:
> 4749 4f50 0100 0101 0d00 0000 0000 0000 GIOP............
> ea07 0000 0000 0000 01                  .........
> 
> 
> Debugging this has not been easy as the problem is 
> intermittent. I'm wondering (and this is a complete guess) 
> whether this problem could be caused by a timeout occurring 
> at the client endpoint, and then the client generating a 
> close connection message as a result (to pass back up through 
> its own stack), but generating the wrong GIOP version 
> message? My gut feeling is that this probably isn't the case 
> because omniNames was running, the machine wasn't under any 
> great load, so the message shouldn't have timed out. But I 
> haven't had any better ideas as to what would cause this 
> spurious message to be received.
> 
> Any ideas?
> 
> Thanks,
> Daniel. 
> 
> 
> - Daniel Bell
> - Software Engineer, Colorbus Pty Ltd
> - Email: daniel.bell at colorbus.com.au
> - Phone: 61 3 8574 8035
> - WWW:   http://www.colorbus.com
> 
> 
> _______________________________________________
> omniORB-list mailing list
> omniORB-list at omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
> 



More information about the omniORB-list mailing list