[omniORB] Bug report: omniORB manifests timeouts with TRANSIENT_ConnectFailed exception

Duncan Grisby duncan at grisby.org
Fri Jul 1 15:56:52 BST 2011


On Thu, 2011-06-16 at 12:23 +0200, Serguei Kolos wrote:

> This happens rarely due to the known inaccuracy of the timeouts
> implementation in the "poll" system call, which varies from 1 to 10
> milliseconds for different platforms. The issue is that when the
> doConnect function in the tcpAddress.cc file detects timeout (lines
> 356-358) it returns 0 just as in the case of a simple connection failure.

You're right, that is a problem...

> I have attached the patch, which provides simple workaround for
> the issue. It does the 10ms range check in the giopStream::errorOnSend
> function, but I would consider it as an ugly and improper hack.
> A proper fix would most likely imply the change in the communication
> protocol interface (for example the giopAddress::Connect abstract
> function may be redefined to return an indication of a timeout explicitly).

I agree. It's an ugly hack. I therefore don't want to apply it to 4.1.x.
I suggest you continue to use your modification, but I won't put it in
the main codebase.

Instead, I will fix the issue the correct way, which as you say is to
extend giopAddress::Connect so that it can indicate timeout explicitly.
That can't happen in 4.1.x because it would break binary compatibility,
so I'll do it in trunk, which will become 4.2.x.

Thanks,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --





More information about the omniORB-list mailing list