Accessing Unicode Data

Reference:

Adapters for the following types of data sources support Unicode:

  • DB2
  • Fixed-format sequential files
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • SAP BW
  • SAP R/3-ECC
  • Sybase ASE
  • Sybase IQ
  • Teradata (CLI)
  • Web Services
  • XBRL
  • XML

Relational adapters in a Unicode environment assume that the DBMS returns character data to FOCUS already converted to Unicode. The relational adapters convert data to the correct DBMS API when writing to a relational data source (for example, Oracle to UTF-8, Microsoft SQL Server to UTF-16, and DB2 on z/OS to UTF-EBCDIC).

XML-based adapters (the Adapter for XML, and the Adapter for XBRL) obtain the code page from the XML declaration of the processed XML document. For more information, see http://www.w3.org/TR/REC-xml#sec-prolog-dtd.

The Adapter for Web Services generates SOAP requests using the UTF-8 code page.

Reference: Unicode Considerations for Oracle

The adapter supports Unicode data in Oracle release 10g or higher databases that have been configured with the NLS_CHARACTERSET parameter set to UTF8. You must set the NLS_LANG environment variable in the edastart file, in a separate shell file, in a database profile, or in a user profile.

Set NLS_LANG using the following syntax

NLS_LANG = language_territory.characterset

where:

language

Is the selected language.

territory

Is the name of the country associated with the selected language.

characterset

Is the value of the NLS_CHARACTERSET variable that is set in the Oracle database. For Unicode, this is always UTF8.

For example, for American English UTF-8, you would use the following setting:

NLS_LANG=American_America.UTF8

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Unicode Considerations for DB2

Information Builders supports DB2 databases, version 8 and higher. To prepare the DB2 environment for Unicode on:

  • Windows, the database must have been created with the option CODESET UTF-8, and you must add the following variable to the environment using Windows or in the edastart file:
    DB2CODEPAGE=1208
  • UNIX, the database must have been created with the option CODESET UTF-8, and you must set the LANG and NLS_LANG environment variables in the edastart file or in a separate shell file.

    For example, for American English, you would export the following variables:

    export LANG=EN_US.UTF-8
    export NLS_LANG=American_America.UTF8
  • z/OS, the database must have been created with the CCSID UNICODE option, and you must ensure that the DSNAOINI environment variable points to a configuration file containing the following specification:
    CURRENTAPPENSCH=UNICODE

    The adapter supports Unicode only with the CLI interface.

    In a Unicode environment, the Adapter for DB2 requires a BIND command for PREPARE/EXECUTE logic using parameter markers.

    For information about data type support, see Relational Adapter Data Type Support for Unicode.

  • IBM i, Unicode is only supported in DB2 CLI mode. DB2 allows column-by-column specification of CCSID information at CREATE TABLE time. The columns may or may not be explicitly Unicode 1208 and 1200, however, all CCSIDs are transcoded to FOCUS as CSID 1208 UTF8. Thus, any existing table may be used in a Unicode configuration regardless of its underlying CCSID specifications.

Reference: Unicode Considerations for Sybase ASE

The adapter supports Unicode data in Sybase ASE version 15.0 and higher databases that have been created with the CHARACTER SET option set to UTF-8. You must set the LANG and NLS_LANG environment variables in the edastart file or in a separate shell file before starting FOCUS.

For example, for American English, you would export the following variables:

export LANG=EN_US.UTF-8
export NLS_LANG=American_America.UTF8

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Unicode Considerations for Sybase IQ

Beginning with Sybase IQ version 12.7, the adapter supports Unicode data in Sybase IQ databases that have been created with the UTF-8 character set. You must set the LANG and NLS_LANG environment variables in the edastart file or in a separate shell file before starting FOCUS.

For example, for American English, you would export the following variables:

export LANG=EN_US.UTF-8
export NLS_LANG=American_America.UTF8

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Unicode Considerations for Microsoft SQL Server

The adapter, using the OLE DB interface, supports Unicode data stored in NCHAR and NVARCHAR fields (where N stands for national). N columns can support data of any language or combination of languages.

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Unicode Considerations for Teradata (CLI)

The Adapter for Teradata (CLI) supports Unicode UTF-8 format if:

  • The Teradata CLI client components are part of release TTU8.0 or higher.
  • The Teradata database is release V2R6.0 or higher and appropriate language support was enabled during the sysinit process.

Contact your database administrator (DBA) to determine whether international language support has been enabled in your Teradata system and/or consult the Teradata documentation for details about International Character Set support.

Note that, at the present time, when Unicode is enabled the length of a Teradata Column Name and/or TITLE cannot exceed 21 characters (bytes).

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Unicode Considerations for MySQL

The Adapter for MySQL is implemented using JDBC. This implementation supports Unicode data stored in character fields with CHARACTER SET set to UTF-8.

You must set the LANG environment variable in the edastart file or in a separate shell file before you start FOCUS. For example, for American English you would export the following variable:

export LANG=EN_US.UTF-8

For information about data type support, see Relational Adapter Data Type Support for Unicode.

Reference: Relational Adapter Data Type Support for Unicode

In Unicode databases the information in CHAR(n) columns is stored in a UTF-8 encoding scheme. Most RDBMS Unicode columns of CHAR type specify length in bytes, not characters. The B-modifier in the Actual format denotes that a character column with a fixed byte length might contain a varying number of UTF-8 characters. This is reflected in the AnV Usage format.

DBMS

Column Type

Usage

Actual*

DB2

CHAR(n)

AnV

AnB

GRAPHIC(n)

An

An

VARCHAR(n)

AnV

AnVB

VARGRAPHIC(n)

AnV

AnV

Microsoft SQL Server

CHAR(n) single byte code page

An

An

CHAR(n) double byte code page

AnV

AnV

NCHAR(n)

An

An

VARCHAR(n)

AnV

AnV

NVARCHAR(n)

AnV

AnV

MySQL

CHAR(n)

An

An

VARCHAR (n)

AnV

AnV

Oracle

CHAR(n CHAR)

An

An

CHAR(n BYTE)

AnV

AnB

NCHAR(n)

An

An

VARCHAR(n CHAR)

AnV

AnV

VARCHAR(n BYTE)

AnV

AnVB

NVARCHAR(n)

AnV

AnV

Sybase ASE

CHAR(n)

An

AnB

UNICHAR(n)

An

An

VARCHAR(n)

AnV

AnVB

UNIVARCHAR(n)

AnV

AnV

Sybase IQ **

CHAR(n)

An

AnB

VARCHAR(n)

AnV

AnVB

Teradata

CHAR(n)

An

An

VARCHAR (n)

AnV

AnV

* Note that on EBCDIC platform(s) the ACTUAL size for a B-suffixed format is increased 1.5 times to accommodate the expansion when converting from UTF-8 to UTF-EBCDIC. For example, on z/OS the synonym created for a DB2 CHAR(10) column contains the following: USAGE=A10, ACTUAL=A15B.

**Note the following limitation for Sybase IQ: You cannot use the HOLD FORMAT SYBASE command with the Unicode implementation of Sybase IQ since this command depends on the availability of UNICHAR and UNIVARCHAR data types, which are not supported by Sybase IQ.

Reference: Unicode Considerations for SAP BW and SAP R/3-ECC

SAP uses UTF-16 encoding in its Unicode system. FOCUS uses UTF-8 and handles all conversions between the two encoding schemes. FOCUS may not need to be configured for Unicode when accessing the SAP Unicode system.

NLS settings for FOCUS must be configured in such a way that the FOCUS code page can handle the list of chosen languages. For example, ISO 8859-1 can accommodate most Western European languages. The 8859 family can handle character specifics with the lower set almost being mapped to US ASCII. Therefore, with 8859-1 one could request English, German, French, and Spanish. When a character set requires a code page that takes more than one byte per character (for example, many Asian languages), the only choice for FOCUS is 65001 (UTF-8).

The adapters provide access to Unicode SAP BW and SAP ECC systems, respectively. This extends support of data and metadata in multiple languages to FOCUS, consistent with support by the SAP server. A synonym can be created using one or more languages. Those languages will be used to create titles and descriptions.

  • For SAP BW, the userid and password in the sapserv.cfg file must be able to connect to SAP BW using the enumeration of desired languages.
  • For SAP R/3-ECC, FOCUS logon language is used to retrieve all languages.

Reference: Unicode Considerations for Fixed-Format Sequential Files

When retrieving a fixed-format sequential file, FOCUS attempts to determine the code page the file was meant to be retrieved with by checking the Master File CODEPAGE attribute. If the Master File does not contain the CODEPAGE attribute, FOCUS code page is used to read the file.

In a Unicode configuration, HOLD files in BINARY and ALPHA formats are created using UTF-8 conversion, which assigns each character three bytes of storage in ASCII environments or four bytes in EBCDIC environments. Fields defined in the Master File using the data type A in both the USAGE and ACTUAL attributes are described in terms of characters. Fields defined using any other combination of USAGE and ACTUAL attribute values are described in terms of bytes.

To force a field in a fixed-format sequential file to be described in terms of bytes, add B to the end of the ACTUAL attribute. For example, to specify that a field is stored in 10 bytes, you would specify:

ACTUAL=A10B

The adapter will then read the specified number of bytes from the record and convert their contents to the number of characters specified by the file code page.

Regardless of how much storage a character occupies, it occupies only one space on a report, as always.


Information Builders