[omniORB] RE: OmniORB 4.0.1 server application hangs: endpoint shutdown pro blem

Kamaldeep Singh Khanuja Kkhanuja at quark.co.in
Fri Oct 1 19:52:26 BST 2004


****Forgot to mention that following problem was reported in SOLARIS. On
WINDOWS it was no tested.****
Hello All,
 
We are using OmniORB 4.0.1 for are server application, which consists of
various distributed components across the globe. The problem is that the our
server hangs after giving the following error:
omniORB: Unrecoverable error for this endpoint: giop:tcp:10.91.201.202:2222,
it will no longer be serviced.
There are no reproducible steps to the above error but it reoccurs in few
hours of operation. However, upon investigation we have found that one of
code in OmniORB the above error could be displayed is in the following
scenario:
CORBA::Boolean
tcpEndpoint::notifyReadable(SocketHandle_t fd) {
  if (fd == pd_socket) {
    SocketHandle_t sock;
    sock = ::accept(pd_socket,0,0);
    if (sock == RC_SOCKET_ERROR) {
      return 0;
    }
....
...
}
As it is clear from the above that whenever accept sys call fails (in our
accept fails with error ECONNABORTED which means "Software caused connection
abort") this routine would return 0 and eventually OmniORB would shutdown
the endpoint e.g. giop:tcp:10.91.201.202:2222 in our case. 
 
Question 1: Is it desired that whenever there is such failure occurs OmniORB
should stop servicing the concerned endpoint, because in real time accept
could fail even if there is any n/w problem from the clients who are
connecting to the server? 
 
To solve this we have changed the giopRendezvouser::execute() in
giopRendezvouser.cc to do NOT break from the while loop of incase
AcceptAndMonitor return NULL pointer i.e. internally when accept fails.
Please see the following code snippet from changed
giopRendezvouser::execute() method:
void
giopRendezvouser::execute()
{
....
....
  CORBA::Boolean exit_on_error;
 
  do {
    exit_on_error = 0;
    giopConnection* newconn = 0;
    try {
      newconn = pd_endpoint->AcceptAndMonitor(notifyReadable,this);
      if (newconn) {
         pd_server->notifyRzNewConnection(this,newconn);
      }
      else {
        /******** COMMENTED OUT THE FOLLWOING TWO LINES *********
          exit_on_error = 1;
         break;
 
****************************************************************************
***/
      }
    }
....
....
} // end function
 
After making the above change now our server logs the SAME error message,
but resumes and keep listening on the SAME endpoint e.g.
giop:tcp:10.91.201.202:2222 in our case.
 
Question 2: Is the above fix right or does it violates CORBA specs in any
way?
 
Also, in the current scope we cannot use the multiple endpoints to keep
server application available as it does not solve our problem.
 
Regards,
 
--Kamal



More information about the omniORB-list mailing list