[omniORB] Canceling a blocking function

Mon, 3 Apr 2000 08:32:00 -0400

Thanks for your suggestion, however see below.

-----Original Message-----
From: Tres Seaver <tseaver@palladion.com>
To: Han Kiliccote <kiliccote@cmu.edu>
Cc: omniorb-list@uk.research.att.com <omniorb-list@uk.research.att.com>
Date: Sunday, April 02, 2000 3:16 PM
Subject: Re: [omniORB] Canceling a blocking function

>Han Kiliccote wrote:
>>
>> At Carnegie Mellon University, we are developing a prototype for a
>> distributed system that contains a very large number of servers (e.g.,
>> 100000servers). In this prototype, we need to send a request to a large
>> subset of these servers (e.g., 100).
>>
>> Currently we have a loop that uses a thread pool to attach a thread to a
>> server and each thread calls a function in a different server.
>>
>> When a percentage (e.g., 50) of these functions return, we would like to
>> cancel the operation in the remaining threads which are blocking either
>> because the servers are down/faulty or just about to complete but not yet
>> completed.
>>
>> Currently we don't know how to do this. In each remaining thread, there
is a
>> call
>>
>> server[i]-> do_function(argument) // blocked (no reply yet)
>>
>> How can we unblock this? We don't want to wait  more than 10sec for these
>> functions to timeout because since the overall function is deemed
completed,
>> there will be another request soon and this would cause a very large
number
>> threads to exist in the system at any given point. We dont want to lower
the
>> timeout to anything to less than 10sec because this would cause an early
>> abort in some cases.
>>
>> Your advice and help are greatly appreciated.
>>
>> P.S. Shall we switch to one-way functions?
>
>Consider very carefully using something like CosEvents/CosNotifications
>to manage the NxM communications you need here.  One way to handle your
>scenario:
>
> 1.  Create a notification channel within your "master" server (or in
>     a separate server, perhaps for scalability).
>

I should have been clearer about the research. Our goal is to remove any
central server from the system (or replace each server with a randomly
chosen a large set of servers). Each client acts like a mini-server. Master
or monolithic servers are something we are trying to replace. The goal of
the research is to show that we can create a large system and still not have
any single point of failure dues to centralized servers.

> 2.  Create another channel on which to broadcast requests from the
>     requests from master to slave servers.
>
> 2.  In each slave server, subscribe a pull consumer to the "request"
>     channel.  One thread loops as follows:
>
>       - pull new request from the requst channel & enqueue them
>
>       - pull cancellations from the request channel and mark their
>         requests.
>
>     Another thread pulls requests from the queue, processing each one
>     while checking at intervals to see if it has been cancelled.  On
>     completion, the processing thread pushes the result to the "result"
>     channel.
>
>     This server could perhaps be single-threaded, since you have to
>     break the "work" up into segmentes to allow checking for
>     cancellation.
>
> 2.  From the master server, BEFORE broadcasting your requests,
>     register a pull consumer on the "results" channel, using a filter
>     for the request ID you  are about to broadcast.
>
> 3.  On the master server, push the request onto the "request" channel.
>     Repeatedly pull results from the channel until reaching your
>     desired threshhold.  Unsubscribe from the channel (results not
>     yet received will go into the bit bucket).  Broadcast a cancel
>     on the current request.
>
>One-ways won't help a whole lot here, unless the request-processing time
>is very small.  The new asynchronous message invocation (AMI) spec might
>help, but I imagine that you are truly CPU bound here (else why 10E5
>servers), so the network latency is likely not a big problem.  The
>"scatter-gather" solution I proposed has the advantage of decoupling the
>master and the slaves, which becomes especially critical for issues
>involving large numbers of peers (yours is the largest number I have ever
>seen seriously proposed!)
>

Actually (at our current implementation) we are network bound. Each request
takes 0.1ms to complete and 4ms is vasted on the network.

I though omniorb does not support AMI. Am I wrong?

>Best,
>
>Tres.
>--
>=========================================================
>Tres Seaver  tseaver@palladion.com