[omniORB] How to the Naming Service to support many simultaneous connections.

souchaud souchaud at labri.fr
Thu May 10 12:20:24 BST 2007


Duncan Grisby a écrit :
> On Wednesday 25 April, souchaud wrote:
> 
>> When I launched my programm on a cluster and if I use more than ~200
>> nodes, my application cannot start and the following error appear :
>>
>> ... Failed to resolve NameService ...
>>
>> Here is the portion of the code used to resolve the naming service :
>>
>>   // Obtain a reference of the name service:
>>   COLCOWS_DEBUG(dblTest, "Obtain a reference of the Naming Service");
>>   try {
>>     obj_ref = orbp->resolve_initial_references("NameService");
>>     new_node_servant->_naming_ctxt =
>> CosNaming::NamingContext::_narrow(obj_ref);
>>   }
>>   catch(...) {
>>     COLCOWS_ERROR("Failed to resolve NameService");
>>   }
> 
> Using catch-all clauses is generally a bad idea since it can mask the
> reasons behind problems.
> 
> In this case, you are probably getting a COMM_FAILURE exception because
> omniNames is unable to service new connections. You can try varying the
> omniORB parameters about whether to use thread per connection or thread
> pool mode. Depending on your platform, you ought to be able to configure
> omniNames to support at least 1000 concurrent connections. If you run
> omniNames with -ORBtraceLevel 25 -ORBtraceThreadId 1, you will probably
> get an idea of why the connections are failing.
> 
> The other thing you can do is to reduce the time omniNames will hold
> open idle connections by setting the inConScanPeriod to a small number
> of seconds, and setting scanGranularity to 1 second rather than the
> default 5. That will mean that it closes idle connections sooner, and be
> able to reuse file descriptors and threads.
> 
> Cheers,
> 
> Duncan.
> 

Ok, it works well with 256 nodes now. The problem was that I did not
catch the transient exception. Now, if a transient or a comm failure
exception raise, I sleep 1 seconds and I try again. With 128 nodes there
is no retry, with 256 nodes some transient exceptions raise...
I did not have to tune the omniNames configuration.

Thanks for your help,
Mathieu



More information about the omniORB-list mailing list