[omniORB] OmniOrb and CP1252 (Windows Latin 1) vs. ISO-8859-1

Ridgway, Richard (London) Richard_Ridgway at ml.com
Tue Jul 29 08:40:23 BST 2008


Jacorb it is controlled with -Dfile.encoding=ISO_8859_1 on the command line.
 
I had to start using that to get Jacorb to interop with Orbix. Never had any problems with omniorb or tao, but maybe didn't see the same situation.
 
 
Richard
 

-----Original Message-----
From: omniorb-list-bounces at omniorb-support.com [mailto:omniorb-list-bounces at omniorb-support.com] On Behalf Of William Bauder
Sent: 29 July 2008 00:05
To: 'Steven Sauder'; omniorb-list at omniorb-support.com
Subject: RE: [omniORB] OmniOrb and CP1252 (Windows Latin 1) vs. ISO-8859-1


I haven't had to deal with this myself, but it did trigger a memory of something I saw in OrbConstants:
 
    // The CHAR_CODESETS and WCHAR_CODESETS allow the user to override the default
    // connection code sets.  The value should be a comma separated list of OSF
    // registry numbers.  The first number in the list will be the native code
    // set.
    //
    // Number can be specified as hex if preceded by 0x, otherwise they are
    // interpreted as decimal.
    //
    // Code sets that we accept currently (see core/OSFCodeSetRegistry):
    //
    // char/string:
    //
    // ISO8859-1 (Latin-1)     0x00010001
    // ISO646 (ASCII)          0x00010020
    // UTF-8                   0x05010001
    //
    // wchar/string:
    //
    // UTF-16                  0x00010109
    // UCS-2                   0x00010100
    // UTF-8                   0x05010001
    //
    // Note:  The ORB will let you assign any of the above values to
    // either of the following properties, but the above assignments
    // are the only ones that won't get you into trouble.
    public static final String CHAR_CODESETS = SUN_PREFIX + "codeset.charsets";
    public static final String WCHAR_CODESETS = SUN_PREFIX + "codeset.wcharsets";

Assuming that you're using strings, and the problem isn't in their ISO-8859 encoding, you might be able to fix on the java side by changing the default codeset.
 
-Bill

-----Original Message-----
From: omniorb-list-bounces at omniorb-support.com [mailto:omniorb-list-bounces at omniorb-support.com] On Behalf Of Steven Sauder
Sent: Monday, July 28, 2008 5:18 PM
To: omniorb-list at omniorb-support.com
Subject: [omniORB] OmniOrb and CP1252 (Windows Latin 1) vs. ISO-8859-1


Hi all!

We’re a long-time user of OmniOrb with great success in our applications, but something has recently come up which is causing problems for our European customers.  Our applications all speak the (full) Windows CP1252 (Windows Latin 1) character set, in which Microsoft has used the code point 0x80 to represent the Euro symbol (€).  CP1252 and ISO-8859-1 are “almost” the same, except that CP1252 utilizes the 0x80 code point to represent the Euro, where ISO-8859-1 leaves this code point blank.  

After a bit of investigation, it seems that OmniOrb by default uses ISO-8859-1 as the “native” codeset, which I had thought would mean that the Euro symbol (and a couple of other “special” characters such as the trademark symbol, and the “curly” printers quotes), which are represented in CP1252, but not in ISO-8859-1, could not be handled by OmniOrb using its default codeset.  However, digging into cs-8859-1.cc a little more, it looks like the translation tables ARE passing 0x80 through to UCS as 0x0080, so unless I’m reading this wrong, any OmniOrb-to-OmniOrb communications (on Windows) should pass the (Windows-specific) Euro code point 0x80 through without problem.  Am I reading this right?

However, the difficulty arises because we have several CORBA components which are written using the standard Java ORB, which (it appears) is not providing the same amount of leeway with this symbol, and insists on transmitting the Euro symbol in it’s “true” UCS16 representation (0x20AC), which OmniOrb’s codeset converters end up turning into a “?” when we receive it on the Windows end.

Has anyone had any experience with this?  From what I’ve read so far, it seems the only viable solution would be to write our own NCS-C implementation that handled the CP1252 Euro symbol (0x80) to Unicode (0x20AC) and back-again conversion through the translation tables as is currently happening in cs-8859-1.cc, is this correct?

Any help would be hugely appreciated!
Thanks
Steve.
-- 
Steve Sauder
Chief Technology Officer
North Plains Systems Corp.
510 Front Street West, 4th Floor
Toronto, ON
Canada  M5V 3H3
P:   (416) 345-1900 ext. 500
F:   (416) 599-0808
W:  http://www.northplains.com/
E:   ssauder at northplains.com 

Confidentiality Notice:
The information contained herein is confidential and proprietary to North Plains Systems Corp. ("North Plains") and is intended for review by authorized persons only. Except as may otherwise be agreed to in writing by North Plains, any disclosure, circulation, release or use of the information contained herein is strictly prohibited.

Upcoming Webinar:
Marketing Made Easy With Digital Asset Management 
August 14th, 2008 – 1:00PM EST (10:00AM PST)
Click to register: http://www.northplains.com/news/newsItem.cfm?cms_news_id=191 <http://www.northplains.com/news/newsItem.cfm?cms_news_id=191&cms_news_type_id=13> &cms_news_type_id=13

TUG 2008 Conference 
September 8th & 9th, 2008
Click to register: http://www.northplains.com/en/customer_portal/conference.cfm
--------------------------------------------------------

This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20080729/b6a49874/attachment-0001.htm


More information about the omniORB-list mailing list