Character Functions for DBCS Code Pages

DCTRAN: Translating A Single-Byte or Double-Byte Character to Another

How to:

Translate a Single-Byte or Double-Byte Character to Another

The DCTRAN function translates a single-byte or double-byte character within a character string to another character based on its decimal value. To use DCTRAN, you need to know the decimal equivalent of the characters in internal machine representation.

To use DCTRAN, you need to know the decimal equivalent of the characters in internal machine representation.

Top of page

Syntax: How to Translate a Single-Byte or Double-Byte Character to Another

DCTRAN(length, source_string, inhexchar, outhexchar, output_format)

where:

length

Double

Is the number of characters in the source_string field.

source_string

Alphanumeric

Is the character string to be translated.

decimal

Double

Is the ASCII or EBCDIC decimal value of the character to be translated.

decvalue

Double

Is the ASCII or EBCDIC decimal value of the character to be used as a substitute for inhexchar.

output_format

Alphanumeric

Is the name of the field that contains the result, or the format of the output value enclosed in single quotation marks.

Top of page

Example: Using DCTRAN to Translate Double-Byte Characters

In the following:

DEDIT: Extracting or Adding Characters

How to:

Extract or Add DBCS or SBCS Characters

If your configuration uses a DBCS code page, you can use the DEDIT function to extract characters from or add characters to a string.

DEDIT works by comparing the characters in a mask to the characters in a source field. When it encounters a nine (9) in the mask, DEDIT copies the corresponding character from the source field to the new field. When it encounters a dollar sign ($) in the mask, DEDIT ignores the corresponding character in the source field. When it encounters any other character in the mask, DEDIT copies that character to the corresponding position in the new field.

Top of page

Syntax: How to Extract or Add DBCS or SBCS Characters

DEDIT(inlength, source_string, mask_length, mask, outfield)

where:

inlength

Integer

Is the number of bytes in source_string. The string can have a mixture of DBCS and SBCS characters. Therefore, the number of bytes represents the maximum number of characters possible in the source string.

source_string

Alphanumeric

Is the string to edit enclosed in single quotation marks, or the field containing the string.

mask_length

Integer

Is the number of characters in mask.

mask

Alphanumeric

Is the string of mask characters.

Each nine (9) in the mask causes the corresponding character from the source field to be copied to the new field.

Each dollar sign ($) in the mask causes the corresponding character in the source field to be ignored.

Any other character in the mask is copied to the new field.

outfield

Alphanumeric

Is the field to which the result is returned, or the format of the output value enclosed in single quotation marks.

Top of page

Example: Adding and Extracting DBCS Characters

The following example copies alternate characters from the source string to the new field, starting with the first character in the source string, and then adds several new characters at the end of the extracted string:

The following example copies alternate characters from the source string to the new field, starting with the second character in the source string, and then adds several new characters at the end of the extracted string:

DSTRIP: Removing a Single-Byte or Double-Byte Character From a String

How to:

Remove a Single-Byte or Double-Byte Character From a String

The DSTRIP function removes all occurrences of a specific single-byte or double-byte character from a string. The resulting character string has the same length as the original string but is padded on the right with spaces.

Top of page

Syntax: How to Remove a Single-Byte or Double-Byte Character From a String

DSTRIP(length, source_string, char, output_format)

where:

length

Double

Is the number of characters in source_string and outfield.

source_string

Alphanumeric

Is the string from which the character will be removed.

char

Alphanumeric

Is the character to be removed from the string. If more than one character is provided, the left-most character will be used as the strip character.

Note: To remove single quotation marks, use two consecutive quotation marks. You must then enclose this character combination in single quotation marks.

output_format

Alphanumeric

Is the name of the field that contains the result, or the format of the output value enclosed in single quotation marks.

Top of page

Example: Removing a Double-Byte Character From a String

In the following:

DSUBSTR: Extracting a Substring

How to:

Extract a Substring

If your configuration uses a DBCS code page, you can use the DSUBSTR function to extract a substring based on its length and position in the parent string.

Top of page

Syntax: How to Extract a Substring

DSUBSTR(inlength, parent, start, end, sublength, outfield)

where:

inlength

Integer

Is the length of the parent string in bytes, or a field that contains the length. The string can have a mixture of DBCS and SBCS characters. Therefore, the number of bytes represents the maximum number of characters possible in the parent string.

parent

Alphanumeric

Is the parent string enclosed in single quotation marks, or the field containing the parent string.

start

Integer

Is the starting position (in number of characters) of the substring in the parent string. If this argument is less than one or greater than end, the function returns spaces.

end

Integer

Is the ending position (in number of characters) of the substring. If this argument is less than start or greater than inlength, the function returns spaces.

sublength

Integer

Is the length of the substring in characters (normally end - start + 1). If sublength is longer than end - start +1, the substring is padded with trailing spaces. If it is shorter, the substring is truncated. This value should be the declared length of outfield. Only sublength characters will be processed.

outfield

Alphanumeric

Is the field to which the result is returned, or the format of the output value enclosed in single quotation marks.

Top of page

Example: Extracting a Substring

The following example extracts the 3-character substring in positions 4 through 6 from a 15-byte string of characters:

JPTRANS: Converting Japanese Specific Characters

How to:

Convert Japanese Specific Characters

Reference:

Usage Notes for the JPTRANS Function

The JPTRANS function converts Japanese specific characters.

Top of page

Syntax: How to Convert Japanese Specific Characters

JPTRANS ('type_of_conversion', length, source_string, 'output_format')

where:

type_of_conversion

Is one of the following options indicating the type of conversion you want to apply to Japanese specific characters. These are the single component input types:

Conversion Type	Description
'UPCASE'	Converts Zenkaku (Fullwidth) alphabets to Zenkaku uppercase.
'LOCASE'	Converts Zenkaku alphabets to Zenkaku lowercase.
'HNZNALPHA'	Converts alphanumerics from Hankaku (Halfwidth) to Zenkaku.
'HNZNSIGN'	Converts ASCII symbols from Hankaku to Zenkaku.
'HNZNKANA'	Converts Katakana from Hankaku to Zenkaku.
'HNZNSPACE'	Converts space (blank) from Hankaku to Zenkaku.
'ZNHNALPHA'	Converts alphanumerics from Zenkaku to Hankaku.
'ZNHNSIGN'	Converts ASCII symbols from Zenkaku to Hankaku.
'ZNHNKANA'	Converts Katakana from Zenkaku to Hankaku.
'ZNHNSPACE'	Converts space from Zenkaku to Hankaku.
'HIRAKATA'	Converts Hiragana to Zenkaku Katakana.
'KATAHIRA'	Converts Zenkaku Katakana to Hiragana.
'930TO939'	Converts codepage from 930 to 939.
'939TO930'	Converts codepage from 939 to 930.

length

Integer

Is the number of characters in the source_string.

source_string

Alphanumeric

Is the string to convert.

output_format

Alphanumeric

Is the name of the field that contains the output, or the format enclosed in single quotation marks.

Top of page

Example: Using the JPTRANS Function

JPTRANS('UPCASE', 20, Alpha_DBCS_Field, 'A20')

JPTRANS('LOCASE', 20, Alpha_DBCS_Field, 'A20')

JPTRANS('HNZNALPHA', 20, Alpha_SBCS_Field, 'A20')

JPTRANS('HNZNSIGN', 20, Symbol_SBCS_Field, 'A20')

JPTRANS('HNZNKANA', 20, Hankaku_Katakana_Field, 'A20')

JPTRANS('HNZNSPACE', 20, Hankaku_Katakana_Field, 'A20')

JPTRANS('ZNHNALPHA', 20, Alpha_DBCS_Field, 'A20')

JPTRANS('ZNHNSIGN', 20, Symbol_DBCS_Field, 'A20')

JPTRANS('ZNHNKANA', 20, Zenkaku_Katakana_Field, 'A20')

JPTRANS('ZNHNSPACE', 20, Zenkaku_Katakana_Field, 'A20')

JPTRANS('HIRAKATA', 20, Hiragana_Field, 'A20')

JPTRANS('KATAHIRA', 20, Zenkaku_Katakana_Field, 'A20')

In the following, codepoints 0x62 0x63 0x64 are converted to 0x81 0x82 0x83, respectively:

JPTRANS('930TO939', 20, CP930_Field, 'A20')

In the following, codepoints 0x59 0x62 0x63 are converted to 0x81 0x82 0x83, respectively:

JPTRANS('939TO930', 20, CP939_Field, 'A20')

Top of page

Reference: Usage Notes for the JPTRANS Function

HNZNSIGN and ZNHNSIGN focus on the conversion of symbols.
Many symbols have a one-to-one relation between Japanese Fullwidth characters and ASCII symbols, whereas some characters have one-to-many relations. For example, the Japanese punctuation character (U+3001) and Fullwidth comma , (U+FF0C) will be converted to the same comma , (U+002C). We have the following EXTRA rule for those special cases.

HNZNSIGN:
- Double Quote " (U+0022) -> Fullwidth Right Double Quote ” (U+201D)
- Single Quote ' (U+0027) -> Fullwidth Right Single Quote ’ (U+2019)
- Comma , (U+002C) -> Fullwidth Ideographic Comma (U+3001)
- Full Stop . (U+002E) -> Fullwidth Ideographic Full Stop ? (U+3002)
- Backslash \ (U+005C) -> Fullwidth Backslash \ (U+FF3C)
- Halfwidth Left Corner Bracket (U+FF62) -> Fullwidth Left Corner Bracket (U+300C)
- Halfwidth Right Corner Bracket (U+FF63) -> Fullwidth Right Corner Braket (U+300D)
- Halfwidth Katakana Middle Dot ? (U+FF65) -> Fullwidth Middle Dot · (U+30FB)
ZNHNSIGN:
- Fullwidth Right Double Quote ” (U+201D) -> Double Quote " (U+0022)
- Fullwidth Left Double Quote “ (U+201C) -> Double Quote " (U+0022)
- Fullwidth Quotation " (U+FF02) -> Double Quote " (U+0022)
- Fullwidth Right Single Quote ’ (U+2019) -> Single Quote ' (U+0027)
- Fullwidth Left Single Quote ‘ (U+2018) -> Single Quote ' (U+0027)
- Fullwidth Single Quote ' (U+FF07) -> Single Quote ' (U+0027)
- Fullwidth Ideographic Comma (U+3001) -> Comma , (U+002C)
- Fullwidth Comma , (U+FF0C) -> Comma , (U+002C)
- Fullwidth Ideographic Full Stop ? (U+3002) -> Full Stop . (U+002E)
- Fullwidth Full Stop . (U+FF0E) -> Full Stop . (U+002E)
- Fullwidth Yen Sign ¥ (U+FFE5) -> Yen Sign ¥ (U+00A5)
- Fullwidth Backslash \ (U+FF3C) -> Backslash \ (U+005C)
- Fullwidth Left Corner Bracket (U+300C) -> Halfwidth Left Corner Bracket (U+FF62)
- Fullwidth Right Corner Bracket (U+300D) -> Halfwidth Right Corner Bracket (U+FF63)
- Fullwidth Middle Dot · (U+30FB) -> Halfwidth Katakana Middle Dot · (U+FF65)
HNZNKANA and ZNHNKANA focus on the conversion of Katakana
They convert not only letters but also punctuation symbols on the following list:
- Fullwidth Ideographic Comma (U+3001) <-> Halfwidth Ideographic Comma (U+FF64)
- Fullwidth Ideographic Full Stop (U+3002) <-> Halfwidth Ideographic Full Stop (U+FF61)
- Fullwidth Left Corner Bracket (U+300C) <-> Halfwidth Left Corner Braket (U+FF62)
- Fullwidth Right Corner Bracket (U+300D) <-> Halfwidth Right Corner Bracket (U+FF63)
- Fullwidth Middle Dot · (U+30FB) <-> Halfwidth Katakana Middle Dot · (U+FF65)
- Fullwidth Prolonged Sound (U+30FC) <-> Halfwidth Prolonged Sound (U+FF70)
JPTRANS can be nested for multiple conversions.
For example, text data may contain fullwidth numbers and fullwidth symbols. In some situations, they should be cleaned up for ASCII numbers and symbols.
```
JPTRANS('ZNHNALPHA', 20, JPTRANS('ZNHNSIGN', 20, Symbol_DBCS_Field, 'A20'), 'A20')
```
HNZNSPACE and ZNHNSPACE focus on the conversion of a space (blank character).
Currently only conversion between U+0020 and U+3000 is supported.