[omniORB] wchar/wstring support

Sai-Lai Lo S.Lo@orl.co.uk
03 Feb 1999 11:05:41 +0000


>>>>> Gerald Gutierrez writes:

>> The CORBA spec guarantees variable sizes to be the same on all hosts and
>> architectures for the CORBA::* types.
>> So, in other words, just use CORBA::int which is guaranteed to be 16
>> bit. And a client language with dynamic typing and/or type casting ability.
>> Which is C++ ;)

> For the "int" method to work ( I believe it is a "short" that is 16 bits,
> there is no "int" in CORBA IDL ), you must assume that your codeset is 16
> bit wide and that both client and server know of and use the same codeset.
> In addition, you must manually do all conversion between C++ wchar_t /
> wstring and CORBA int / sequence<int>. For the latter to be trivial, one
> must assume that wchar_t is 16 bits (which excludes practically all UNIX
> based systems), and that the operating system/compiler/runtime uses the
> same codeset as your distributed application. If they are different, you
> must do all codeset conversions.

> You can see why I'm hoping wchar/wstring support will be built into OmniORB
> in the near future.

> So can someone at ORL please let me know whether wchar/wstring support is
> planned?


Gerald,

We did some work on adding wchar/wstring support in the summer. It is not
ready for integrating into the main tree yet. There are a few problems:

1. The on the wire representation of wchar has to be 'negotiated' at
   runtime on a per connection basis. If one side specify a codeset that
   the other cannot support, both sides then fall back to
   unicode. Personally, I found this unnecessarily complicated, why not
   just mandate that the on the wire representation is unicode,
   (UTF8). With the current scheme, the encoding on the wire can be
   anything from 1 to 4 (or more) bytes per wchar. It is impossible for
   something like a bridge to remarshal the data without knowing the
   codeset being used. There were some submissions to fix this but I have
   not followed their progress.

   For omniORB2, I'm inclined to just use unicode all the time.

2. Because of 1, marshalling of wchar and wstring is quite difficult to
   support with the current structure of the marshalling code. We're moving
   to a new marshalling class structure that can simultaneously support
   GIOP 1.1 and GIOP 1.0. The side effect is that wstring and wchar will be
   much easier to do. 

3. Forgive our ignorance with wchar support, We couldn't figure out on
   unices the proper way of finding the encoding scheme currently in use
   when the application is running. This is necessary because the ORB is
   supposed to translate from the on-the-wire encoding to the native
   encoding automatically. If we do not know what the native encoding is,
   we do not know how to do the translation. May be you can shed some light
   on this?

In summary, we do not have wchar/wstring support but we'll have it eventually.

Regards,

Sai-Lai