[omniORB] omniORBpy, Python 2.0 and unicode

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Fri, 15 Dec 2000 14:28:35 -0700


> I'm cross-posting this to the Python do-sig, since I think it is
> relevant to the standard Python mapping, not just omniORBpy.

Oh.  I'm sorry.  I didn't see this before today because my procmail rules 
dumped it all to the do-sig list based on the rules, and I rarely check that 
list because there's so rarely any mail.

> On Sunday 10 December, uche.ogbuji@fourthought.com wrote:
> 
> > Based on quick and dirty tests, if you pass a Python unicode object
> > in a string-type argument using omniORBpy and omniORB (latest CVS),
> > you get a CORBA BAD_PARAM exception.
> >
> > The Python/CORBA binding doesn't say anything about unicode.
> > Understandable since it predates Python/Unicode, but it seems to
> > make sense that Python unicode strings should be accepted as string
> > data.  For one thing, this would parallel the Java binding.
> > 
> > Am I right that one can't use unicode objects for string parameters
> > in omniORBpy?  Are there any plans to change this?  Using omniORB
> > for XML processing (in 4Suite Server), one runs into many unicode
> > objects and it would be quite a burden to precede every CORBA
> > invocation with code for character encoding.
> 
> At present, parameters described as string in IDL must be Python
> strings. omniORB 3 only supports GIOP 1.0, so the only string data
> which can be transmitted must be ISO 8859-1.
> 
> The next major releases of omniORB and omniORBpy (4.0 and 2.0
> respectively) will fully support CORBA's code set negotiation, and the
> wstring type. What I've implemented at the moment is that wstring maps
> to Python unicode, but string still only maps to Python string.
> Strings can, however, be in any supported code set, not just ISO
> 8859-1. That includes UTF-8, so the whole of the Unicode space (and
> more) can be supported.

It would be nice if a Python string could also be passed as a wstring 
parameter.  Then it should all work perfectly.  We would be able to change the 
4Suite Server IDLs to use wstring.

> It would not be much effort to extend omniORBpy so it accepted unicode
> objects when it was expecting strings, but I'm not sure it's a good
> idea. Following the general Python mantra of "explicit is better than
> implicit", I'd lean towards forcing the programmer to convert their
> unicode objects to strings in their chosen encoding, rather than
> having the ORB do it. I think it's analogous to disallowing passing
> floating point values where integers are expected (although integers
> _are_ accepted where floating point values are expected).
> 
> I'm not totally convinced, though. Does anyone else have an opinion on
> the matter?

Tough one.  Of course I would like not to have to have 4SS users preface every 
call to the server with a check and encoding call, but I understand your 
desire for correctness in the general case, and you are right that 
explicitness would suggest that you should throw an exception rather than 
transmogrify the input in a way the user perhaps doesn't expect.

> PS. The adventurous can try out omniORB 4 and omniORBpy 2 by checking
> out the omni4_0_develop and omnipy2_develop branches from CVS. Be
> warned that there are some known bugs which will bite you, but you
> should be able to try out the code set negotiation stuff. There will
> be some very significant changes to the code base before release.

I might just give this a spin.  Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python