[omniORB] Help needed to interpret COMM_FAILURE

Renzo Tomaselli renzo.tomaselli at tecnotp.it
Mon Nov 28 15:18:06 GMT 2005


Duncan,
    thanks for your comments. As you guess, there are no multiple 
threads involved. Although the entire environment is fully threaded, in 
this case we have a sequential client, acting as a COLD pump against a 
colocated server.
It builds a transaction, then it sends it out to the server application. 
This is in production since a few years by several customers, without 
problems. What's strange in this case, is that troubles seem connected 
to overall transaction (e.g. msg) size.
Even more complicated: we use our own transport here, since messages are 
segmented, padded and encrypted in a way similar to SSL.
 From a fresh log, I see that this error comes from a Send, which is 
highly logged except for low level sending, e.g. the original Omiorb 
send(). Thus I will monitor this when failing, to report the errno value.
Sadly enough, all of this occurs at a Swiss customer site, accessible 
only during the night over VPN. Raising the trace level usually yields 
*huge* logfiles, we have to find out a correct threshold for that.
Thanks,

Renzo


Duncan Grisby wrote:

>On Friday 25 November, Renzo Tomaselli wrote:
>
>  
>
>>    this sound interesting. We too had an apparently random problem,
>>appearing as a COMM_FAILURE_MarshalArguments instead.
>>This occurs while the involved client manages to send very large
>>messages (e.g. several hundred megabytes) to the server, co-located on
>>the same Win host.
>>We initialize maxMsgSize to be 1 gigabyte, but we have never seen
>>MARSHAL_MessageSizeExceedLimitOnClient as a minor, as one would expect
>>in case of overflow failures.
>>    
>>
>
>The main time you can see COMM_FAILURE when a message size is exceeded
>is when a server is returning a message that is larger than its own
>message size limit. In that case, it starts sending the reply message,
>but when it is part way through, it discovers that the message is larger
>than permitted. It's too late to send an exception to the client, so it
>has to drop the connection. The client sees that as a
>COMM_FAILURE_UnmarshalReply.
>
>That isn't directly what is happening to you, but there are two related
>possibilities. If your clients are multi-threaded and you have set the
>oneCallPerConnection parameter to false, several client threads can be
>sharing a connection. In that case, if the server drops the connection
>for one call, other calls will see COMM_FAILURES, and could see
>COMM_FAILURE_MarshalArguments.
>
>Alternatively, when a client detects that it has exceeded its message
>size on sending, it too closes the connection it is using, so the server
>doesn't sit waiting for the end of a request that will never come.
>Again, if client threads are sharing a connection, you can see
>COMM_FAILUREs in other threads.
>
>I don't think either of those situations are actually what you're
>seeing, so I'm not sure what's going on. Are you able to get traces
>from traceLevel 25 when it goes wrong?
>
>Cheers,
>
>Duncan.
>
>  
>



More information about the omniORB-list mailing list