[omniORB] omniORB slow transfer when clientCallTimeOutPeriod != 0

Tue Aug 11 13:10:50 BST 2009

Hi,

I know this doesn't help much, but just to let you know I can confirm we
have seen something very similar. Client connections sometimes just hang
like you described.

I looked into a hung process with debugger, and from what I recall it
seemed like a call to recv() just did not return, although the server
process was suddenly terminated either by a crash or call to exit().
However recv() not returning would suggest a problem in winsock, which
we found hard to believe and just thought that the call stack was not
fully correct, being optimized code.

All in all, we didn't spend all that much time debugging this problem,
as it occured only every few days of nonstop running during our internal
test and the rare times we caught it, we couldn't really get anywhere
with the debugger. This was with omniORB 4.1.3, client and server both
running on localhost. OS was most likely Vista64 Ultimate, but I cannot
be completely sure now. The application is built for 32bit architecture
with vs2008 and omniORB is linked statically.

Best regards,
Sampo

> Hi,
>
> It seams we are indeed seeing hang in certain cases. Currently, we cannot 
> reproduce it every time, but it seams something like this triggers it:
>
> 1) invoke remote call and stop call in servant (while (1) { sleep... })
> 2) do some other work with same ORBs from different threads, repeat this for 
> few hours
> 3) kill server
> 4) client does not detect COMM_FAILURE (or some other exception) for call from 
> point (1)
>
> We will continue to pinpoint why this would occur, by talking to some other 
> guys with omniORB experience, it seams it has something to do with surviving 
> interruptions in TCP or maybe even a feature to support clustered 
> environments.
>
>