[omniORB] OmniEvents and threading

Mon Jan 13 10:35:02 2003

On Monday 13 January 2003 16:48, bjorn rohde jensen wrote:
>   I have been exploring various ways to minimize
> the impact of misbehaving clients by garbage
> collecting stale connections, connection specific
> bounded event queues and seperate threads for
> each ProxyPullConsumer and ProxyPushSupplier, all
> of which to be configurable. More ideas are very
> welcome:)

I wrote a new event service from scratch, under considerable
pressure at the time. It took me three days, plus another couple
to debug it. I can't show you the code because it belongs to
my ex-employer, but the main characteristic of the code was that
pretty much everything was done by a single thread. I only
supported one channel and only the push model. It came to a
total of about 300 lines of code.

Each ProxyPushSupplierImpl object (one per consumer)
had an associated state: AwaitingConnection, Active,
DeactivationPending and InActive. When a consumer
deregisters or a push request to that consumer fails the
state of the consumer moves from Active to DeactivationPending.

Incoming events were stored in a semaphore-protected queue.
This was the only event queue in the process.

The main loop of the worker thread stepped through each
ProxyPushSupplierImpl object. If the object was active then
the oldest event was pushed to that object. If deactivation
was pending for the object then it was deactivated via the
RootPOA->deactivate_object() routine. The important
point is that changes in the life stage of the proxy (eg
push failure or deregistration) simply change the state
of the proxy - deactivation is handled by the worker
thread and the servant is destroyed by the POA when
there are no further references to it.

I had planned to make it a bit fancier, adding support
for multiple channels and maybe a pool of event
forwarding threads, but this implementation worked
well and was damn fast.

The QoS we were pursuing was this: a badly-behaved consumer
should have minimal effect on other consumers, and should never
cause the event service to fail or hang. It is better to lose good
consumers than to fail or hang the event service. So we would
chuck consumers if the push() operation failed for any reason,
including timeouts (which were kept short). Many event services
seem to focus on the "deliver at any cost" approach, but we
wanted something more like true broadcast than a half-baked
messaging service.

The real key is to build a good test rig. We had scripts
that would run the service along with several event supplier
processes, each of which would pump out a stream of
events. The script would then launch a bunch of consumer
processes. The script would sure-kill some consumers
while other consumers would automatically deregister
after receiving a couple of hundred events. Then the
script would launch the consumers again. We tracked
the size of the event service process to make sure it wasn't
leaking memory (it did of course, until we fixed it).

I don't know if you will find the above comments useful. Maybe
you have already worked through this stuff. I would have liked
to fix omniEvents rather than writing a new service, but I felt
there was less risk in rewriting than in debugging and on this
occasion I think I was right.

-- 
Bruce Fountain (fountainb@switch.aust.com)
Senior Software Engineer
Union Switch and Signal Pty Ltd
Perth Western Australia
tel: +618 9256 0083