[omniORB] deadlock with distributed callback application

Lars Immisch lars@ibp.de
Thu, 31 May 2001 12:21:58 +0200


Dear Duncan,

> > I have disabled the LocationRequests via
> > -ORBverifyObjectExistsAndType 0 and have since not run into any
> > problems with deadlocks in my distributed callback application.
>
> Hmm. There's clearly something bad going on. The use of LocateRequest
> shouldn't cause any deadlocks. If possible, I think it would be worth
> your while discovering what the problem is, rather than just masking
> it by turning off LocateRequests. The problem might come back to bite
> you later...

Sorry it took so long, I was on vacation.

I have tried to nail down the problem for two days, but I only managed to get  
more confused. I mainly don't understand how omniORB manages connections when  
the operations are oneways and the standard documentation does not say much  
about this topic.

My findings so far are:

Our 'real' system deadlocks immediately when verifyObjectExistsAndType is  
enabled. When I look where it is hanging, both processes are blocked on the  
select in tcpSocketStrand::ll_recv called from the _locateRequest inside the  
omniObjRef::_invoke.

At that point, after some initialisation traffic, process A has sent two  
similar oneway requests to two different objects in process B, and stuck are  
the completion from the first invocation (sent by B) and the second invocation  
from A.

My suspicion was that in this case, the LocateRequest is sent over a reused  
connection, and the other oneway invocation gets into the way. But I haven't  
been able to verify that - mainly because my attempts to recreate the problem  
in a simpler environment failed:

First, I tried with a python client/server. This worked beautifully, but by  
putting some debug print statements into omniORB, I found out that the python  
client and server never send a LocateRequest in that setup. I was pretty  
surprised, but decided to move on and try it with a C++ system that does the  
same thing.

This, to my astonishment, also worked without a deadlock, but it is very slow  
- apparently because a new TCP connection is opened for every invocation from  
server to client. This is something I don't see happening in our 'real' system.  
The python system is much faster, too. Shrug.

The idl is attached below.

Can you point me to some more information how omniORB handles connections?

Will omniORB ever send data in both directions on a TCP connection?

Thanks a lot,

Lars

module Twoway
{
	interface Client
	{
		oneway void unsolicited(in long data);
		oneway void completed(in long data);
	};

	interface Server
	{
		oneway void do_something(in Client c, in long data);
	};
};