Unicode Encoding Standards

FOCUS supports a Unicode Transformation Format (UTF) called UTF-8 in ASCII environments, and UTF-EBCDIC in EBCDIC environments:

In non-Unicode single-byte encoding standards, such as ASCII, each character is assigned a code that is 1 byte long, limiting the number of characters that can be represented by the standard. When using those standards, it became common to equate a character with a byte of storage. If you had a string of 10 characters, the amount of storage needed was 10 bytes, and many character manipulation routines expected character string lengths to be specified as a number of bytes.

With Unicode encoding, bytes and characters are no longer equated. Characters are represented internally by a varying number of bytes, depending on the character. If you configure FOCUS for Unicode, you define the length of strings and alphanumeric fields in terms of characters, not bytes. This simplifies specifying string and field lengths. Each character is represented internally by up to 3 bytes (4 for EBCDIC Unicode), and FOCUS automatically adjusts for the actual storage length. In reports, each character displays in a report column using one space, regardless of how many bytes it takes up in memory. This character-based processing mode employed for Unicode environments is called character semantics. The non-Unicode mode is called byte semantics.

Procedures that had been developed using byte semantics will continue to work when deployed in a Unicode environment, without adjustment, in most cases.

To compress trailing blanks and display columns as the width of the largest actual data value, issue the SET SQUEEZE=ON command.

The main benefit of the new system is the ability to have multiple languages (both European and Asian) in the following FOCUS and Dialogue Manager objects:

For information about configuring FOCUS for Unicode, see the chapter on Configuring FOCUS for National Language Support Services.


Information Builders