[omniORB] onmiAsyncWorker bug fix proposal

Raschick hartmut.raschick at keymile.com
Thu Jul 21 20:07:16 BST 2005


[applies to @least omniORB-4.0.2 up to 4.0.6]

Hi,
in our project we use omniORB, currently version 4.0.3. Under situations,
where heavy load was mixed with relatively idle times, our application
would infrequently crash. The core (SIGSEGV/SIGBUS) would always occur in
omniAsyncInvoker::insert. An idle thread was present and would be used, however,
the retrieved omniAsyncWorker appeared to be garbage.
After some study, we found the following issue: in omniAsyncWorker::real_run
a task would be executed. Then - to avoid superF creation/deletion of workers -
this worker would try to execute tasks from the task Q. If that was empty it
would insert itself into the idle thread list and linger for idle_timeout. If no
task would be assigned to the worker (via the above omniAsyncInvoker::insert), 
it would de-Q from the idle thread list and die.
So far, so good.
Unfortunately, the timedwait used to linger can return w/a spurious wake due to
a process signal (func returning 1), as can be seen in posix.cc:
--snip---------------------------------------------
int
omni_condition::timedwait(unsigned long secs, unsigned long nanosecs)
{
     timespec rqts = { secs, nanosecs };

again:
     int rc = ERRNO(pthread_cond_timedwait(&posix_cond,
                       &mutex->posix_mutex, &rqts));
     if (rc == 0)
     return 1;
...
..
.
--snap---------------------------------------------
If this happens, the worker is _not_ de-Q'd and tries again to fetch a task. If
that again fails, because the are no immediate tasks to process it will try to
linger again, a second time inserting itself into the idle thread list. If now 
two tasks occur they will fetch the same worker. "One" worker could even already
be deleted when the second tries to start (or runs). The "least" that could
happen is there'd be a double free. One of those is what happened in our case,
we believe.

Our proposal is to change omniAsyncWorker::real_run (invoker.cc) like this
--snip---------------------------------------------
   void real_run() {
     int yetInserted = 0;

     if (omniORB::trace(10)) {
       omni_tracedmutex_lock sync(*pd_pool->pd_lock);
       omniORB::logger l;
       l << "AsyncInvoker: thread id = " << pd_id
     << " has started. Total threads = " << pd_pool->pd_totalthreads
     << "\n";
     }
     pd_pool->pd_lock->lock();

     while (pd_task || pd_pool->pd_keep_working) {

       if (!pd_task) {
     if ( !omniTaskLink::is_empty(pd_pool->pd_anytime_tq) ) {
       pd_task = (omniTask*)pd_pool->pd_anytime_tq.next;
       pd_task->deq();
     }
     else {
       // if timedwait falls through w/a "1" (spurious wake due to
       // process signal) we don't want to insert "us" multi times
       // mind: code as is will again attempt to wait the _whole_
       // omniAsyncInvoker::idle_timeout
       if (!yetInserted) {
         yetInserted = 1;
         pd_next = pd_pool->pd_idle_threads;
         pd_pool->pd_idle_threads = this;
           }
       unsigned long abs_sec,abs_nanosec;
       omni_thread::get_time(&abs_sec,&abs_nanosec,
                 omniAsyncInvoker::idle_timeout);
       if ( pd_cond->timedwait(abs_sec,abs_nanosec) == 0 && !pd_task) {
         // Has timeout and has not been assigned a task.

         // Remove this thread from the idle queue
         omniAsyncWorker** pp = &pd_pool->pd_idle_threads;
         while (*pp && *pp != this) {
           pp = &((*pp)->pd_next);
         }
         if (*pp) *pp = pd_next;
         pd_next = 0;
         yetInserted = 0; // probably superF
         break;
       }
       // Dequeue by omniAsyncInvoker. // isn't that the comment for the break 
above?!
       continue;
     }
       }

       unsigned int immediate = (pd_task->category() ==
                 omniTask::ImmediateDispatch);

       pd_pool->pd_lock->unlock();
       try {
     pd_task->execute();
       }
       catch(...) {
     omniORB::logs(1, "AsyncInvoker: Warning: unexpected exception "
               "caught while executing a task.");
       }
       pd_task = 0;
       pd_pool->pd_lock->lock();

       if (immediate) {
     pd_pool->pd_nthreads++;
       }

       if (pd_pool->pd_nthreads > pd_pool->pd_maxthreads) {
     // No need to keep this thread
     break;
       }
     }
--snap---------------------------------------------
So far, we have had no further cores.

Please find attached a version of invoker.cc as taken from omniORB-4.0.3
including above patch.

-- 
Hartmut "Hardy" Raschick / R&D Network Management
KEYMILE Gmbh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omniAsyncWorker-patch.tar.gz
Type: application/x-gzip
Size: 3516 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20050721/44b31e37/omniAsyncWorker-patch.tar.bin


More information about the omniORB-list mailing list