[omniORB] How can I creat a hot-standby name service?

Bruce Fountain fountainb@switch.aust.com
Tue Jul 23 04:09:08 2002


On Tuesday 23 July 2002 10:20, zlr wrote:
>     I have tried to set the same OMNINAMES_LOGDIR path in two pcs where I
> have installed NameService. But I found, when NameService started, it will
> create two files (one is *.log, another is *.bak). They is both include the
> PC name in their file name. ( E.g. I have tow PC--PC1 as master and PC2 as
> slave in hot-standby. it will create four files --omninames-PC1.log,
> omninames-PC1.bak and omninames-PC2.log, omninames-PC2.bak).
>
>     And I also try to synchronization these files. I rename the
> omninames-PC1.log to omninames-PC2.log and omninames-PC1. bak to
> omninames-PC2.bak. But I found the NameService cannot be started in PC2.

Server side - you might find it easier to modify your server to be aware
of the redundant NS and register twice. On the hot standby system
I developed most of the server processes were located on the same host as
the NS, so on failover they would re-register anyway. For those server
processes which were located on other hosts we had a "process manager"
which would notify server processes in the event of failover. The server
processes would then re-register with the new NS.

>     As you say: In omniORB 3.0, there's no option but to make the
> applications aware of the situation and have them retry if they get
> COMM_FAILUREs.

I wrote a neat C++ template which would wrapper an object reference. On
construction you would give the ref a name, which was stored as a member
variable. When a request was made the templatised object would check to
see if it had a valid reference to the remote object. If not, it would resolve
it using the current NS. If an exception was thrown the reference would be
cleared and re-resolved from scratch on the next invocation. The key was
using lazy evaluation - the object would only attempt to resolve the name
on demand, rather than all objects attempting to resolve simultaneously.
This worked well with omniORB 3.0.3, although the omniORB 4.0 multiple
ref syntax would obviously be simpler.

>     But if I register the servers to another NameService (PC2) after
> swiching over. Do I have to restart all servers (Messaging Server, Routing
> Server ...)?

Not entirely sure what your architecture is here, but our approach was
to run a "process manager" process on each host. Whenever a CORBA
process was launched on that host it would register with the PM. The
PM's job was to keep track of things like failover and notify each of the
registered processes when something changed. The processes could
then re-register with the NS or re-resolve references to other servers.

The PM also looked after restarting failed processes etc. and providing
bootstrap parameters to registering processes.

The PM itself is a single point of failure, so it is important to keep it
as simple and robust as possible.

-- 
Bruce Fountain (fountainb@switch.aust.com)
Senior Software Engineer
Union Switch and Signal Pty Ltd
Perth Western Australia
tel: +618 9256 0083