[omniORB] Wide strings, was re: Reserved word "factory"? / Wide strings

Duncan Grisby dgrisby@uk.research.att.com
Tue, 25 Jul 2000 17:54:38 +0100


On Monday 24 July, Randy Wiser wrote:

> With the upcoming release of a version of Python with unicode
> support, we are becoming more interested in omniORB/omniORBpy
> support for wchar (and wstring).

It's a common misconception that CORBA wchar and wstring mean
"Unicode". Unfortunately, things aren't anything like that simple.
CORBA 2.3 allows both char and wchar to be either single byte or
multi-byte characters, and requires that the ORB knows how to convert
between different character encodings.

> 1) Is there a reasonable chance of unicode support happening before
> the end of this year? :-)

Probably not, but you might be lucky. It probably wouldn't take long
to hack together limited support for Unicode, and maybe even make it
interoperate with other ORBs, but it's quite a lot of effort to do the
whole thing. We don't want to do a limited implementation only to have
to throw it away and do it properly. There isn't anyone working on it
at the moment.

> I'm _guessing_ that the 'code set conversion issues' have to do with
> wstring and wchar constants.

Unfortunately not. That's the least of the difficulties. For proper
interoperability you have to do code set negotiation and conversion
when transmitting any kind of string. That's not just wchar and
wstring, but char and string too. If you want to see the whole horror
of it, look at sections 13.7 to 13.9 of the CORBA 2.3 spec. It takes
21 pages.

> 2) Has anyone done a patch to omniORB 3.0 / omniORBpy 1.0 to allow
> simply passing wstring and wchar (we may not need wchar or wstring
> constants)?

Not that I know of but, as I say, it shouldn't be that hard to do a
useful subset of the functionality.

> 3) Or, should we try 'sequence of <some 16 bit type>' as a work
> around until unicode support is available?

I really depends on what you're trying to do. If you just want to send
plain UCS-2 encoded text from one machine to another, regardless of
what platform/language the two machines are using, it will _always_ be
the case that the best way to do it is with sequence<unsigned short>.
If you use wstring, it is entirely likely that the string will be
transformed on the way, leading to DATA_CONVERSION exceptions and
other nastiness.

Cheers,

Duncan.

-- 
 -- Duncan Grisby  \  Research Engineer  --
  -- AT&T Laboratories Cambridge          --
   -- http://www.uk.research.att.com/~dpg1 --