[omniORB] Asymmetrical Comm_failure

JHJE (Jan Holst Jensen) jhje at novonordisk.com
Wed Feb 21 10:55:08 GMT 2007


> I start omninames service and process B (in the same PC 
> normally, but I have prove running omninames in a third PC 
> too). (In this PC, TCP communications are available).
> 
> Then, discconect the network card of PC A, and start process 
> A. This process, every 3 seconds test its TCP conection 
> trying to ping the IP where omninames is running. If the test 
> success (when I reconnect the network card), creates a thread 
> that goes into orb->run(), and (in another thread) poll the process B.
> 
> If the poll reachs process B, in the GUI of process B a green 
> image is shown. Else, the image is red. 
> 
> Process B, when the image turns green, send a message to 
> process A. The methos are different, so I know neither A nor 
> B are connectet to itself.

OK. As I have understood it:

omniNames and process B are running constantly on a PC where the network is always up.

Process A is started on another PC where the network is initially down; process A initializes the orb; waits for the network to come up; looks up B's service in the naming service; invokes a method in B's service; when B receives request from A it calls a service in A.

> The case is, the image in B turns green, but when trying to 
> send the request message, process B goes into the catch statement

I assume that when process A calls the service in B, it passses in an object reference to a servant in process A, right ? Have a look at that object reference - I think that you will see that the object reference contains an IP address which is localhost.

This is because when you started up process A I assume that you initialized the orb before you begin checking for the presence of the network (I am just guessing here, since I haven't seen your code). To the best of my knowledge the orb uses the localhost loopback address as default endpoint when there is no network. Thus all object references to services in process A will contain localhost as the machine IP address. When process B tries to call the service running in process A the call will fail since the object reference points to something running on localhost.

> And what is incomprehensible to me is that if I run process A 
> with its network card active from the beginning, and process 
> B with it disconnect and some time later I connect it (this 
> process checks TCP connections in the same way B does, and 
> also create the thread which attends client incoming 
> request), the image turn to green and the message gets B

The orb initializes with an endpoint visible to others when the network is up. Thus the reference you pass to B (or registers in the naming service) correctly points to the machine where process A is running.

Either have a look at the object reference passed to B or take a look the trace file of process A (run process A with -ORBtraceLevel 20 or so). You should see that the endpoint of the orb differs in the two cases: no network on startup, and network active on startup. Look for the lines in the start of the trace where omniORB lists the addresses that it has:

  omniORB: My addresses are:

If my guesses are correct the problem should be avoidable by either setting a static IP address of the machine containing process A (the machine address is then visible to the orb even when the network is down) or defer initializing the orb until you have made sure that the network is up.

Cheers
-- Jan Holst Jensen, Novo Nordisk A/S, Denmark



More information about the omniORB-list mailing list