Describing a Token-Delimited Data Source

In this section:

You can read files in which fields are separated by any type of delimiter including commas, tabs and other characters. Defining a Master File with the SUFFIX=DFIX attribute lets you specify any combination of characters as the field delimiter. Delimiters may consist of printable or non-printable characters, or any combination of printable and non-printable characters.

Two methods of describing delimited files are supported:

Note: SET HOLDLIST is not supported for delimited files.


Top of page

x
Defining a Delimiter in the Master File

How to:

Reference:

Delimiters in the Master File are defined using a special field named DELIMITER. The FILE declaration must include the attribute SUFFIX=DFIX.



x
Syntax: How to Define a File With Delimiters in the Master File

Describe the delimiter characters in a special field or group named DELIMITER. The delimiter characters are specified in the ALIAS attribute of this special field or group.

To use a delimiter that consists of a single non-printable character or of one or more printable characters, the delimiter is defined as a field with the following attributes:

FIELDNAME=DELIMITER, ALIAS=delimiter, USAGE=ufmt, ACTUAL=afmt ,$

To use a delimiter that consists of multiple non-printable characters or a combination of printable and non-printable characters, the delimiter is defined as a group:

GROUP=DELIMITER,      ALIAS=          , USAGE=ufmtg, ACTUAL=afmtg ,$
 FIELDNAME=DELIMITER, ALIAS=delimiter1, USAGE=ufmt1, ACTUAL=afmt1 ,$
  .
  .
  .
 FIELDNAME=DELIMITER, ALIAS=delimitern, USAGE=ufmtn, ACTUAL=afmtn ,$

where:

DELIMITER

Indicates that the field or group is used as the delimiter in the data source.

delimiter

Identifies a delimiter, up to 30 characters long. For one or more printable characters, the value consists of the actual characters. The delimiter must be enclosed in single quotation marks if it includes characters used as delimiters in Master File syntax. For a non-printable character, the value is the decimal equivalent of the EBCDIC or ASCII representation of the character, depending on your operating environment.

ufmt, afmt

Are the USAGE and ACTUAL formats for the delimiter. Possible values are:

Type of delimiter

USAGE

ACTUAL

Printable characters

An where n is the number of characters

An where n is the number of characters

Non-printable character such as Tab

I4
I1

Group (combination of printable and non-printable characters, or multiple non-printable characters)

Sum of the individual USAGE lengths

Sum of the individual ACTUAL lengths



x
Reference: Usage Notes for a Token-Delimited File


Example: Defining a Delimiter in the Master File

The following example shows a one-character alphanumeric delimiter:

FIELDNAME=DELIMITER, ALIAS=',' ,USAGE=A1, ACTUAL=A1  ,$

The following example shows a two-character alphanumeric delimiter:

FIELDNAME=DELIMITER, ALIAS=//  ,USAGE=A2, ACTUAL=A2  ,$

The following example shows how to use the Tab character as a delimiter:

FIELDNAME=DELIMITER, ALIAS=05  ,USAGE=I4, ACTUAL=I1  ,$

The following example shows how to use a blank character described as a numeric delimiter:

FIELDNAME=DELIMITER, ALIAS=64  ,USAGE=I4, ACTUAL=I1  ,$

The following example shows a group delimiter (Tab-slash-Tab combination):

GROUP=DELIMITER,  ALIAS=   ,USAGE=A9, ACTUAL=A3  ,$
 FIELDNAME=DEL1,  ALIAS=05 ,USAGE=I4, ACTUAL=I1  ,$
 FIELDNAME=DEL2,  ALIAS=/  ,USAGE=A1, ACTUAL=A1  ,$
 FIELDNAME=DEL3,  ALIAS=05 ,USAGE=I4, ACTUAL=I1  ,$


Example: Separating Field Values for Missing Data

The following Master File shows the MISSING attribute specified for the CAR field:

FILE=DFIXF01  ,SUFFIX=DFIX
 SEGNAME=SEG1  ,SEGTYPE=S0
  FIELDNAME=COUNTRY   ,ALIAS=F1  ,USAGE=A10 ,ACTUAL=A10 ,$
  FIELDNAME=CAR       ,ALIAS=F2  ,USAGE=A16 ,ACTUAL=A16 ,MISSING=ON, $
  FIELDNAME=NUMBER    ,ALIAS=F3  ,USAGE=P10 ,ACTUAL=Z10 ,$
  FIELDNAME=DELIMITER ,ALIAS=',' ,USAGE=A1  ,ACTUAL=A1  ,$

In the source file, two consecutive comma delimiters indicate missing values for CAR:

GERMANY,VOLKSWAGEN,1111
GERMANY,BMW,
USA,CADILLAC,22222
USA,FORD
USA,,44444
JAPAN
ENGLAND,
FRANCE

The output is:

COUNTRY            CAR              NUMBER 
-------            ---              ------
GERMANY            VOLKSWAGEN         1111
GERMANY            BMW                   0
USA                CADILLAC          22222
USA                FORD                  0
USA                .                 44444
JAPAN              .                     0
ENGLAND                                  0
FRANCE             .                     0

Top of page

x
Defining a Delimiter in the Access File

Reference:

The Master File has the standard attributes for any sequential file. The SUFFIX value is DFIX. All of the delimiter information is in the Access File.

In addition, you can use the HOLD FORMAT DFIX command to create this type of Master and Access File for a token-delimited file. For information on HOLD formats, see the Creating Reports manual.



x
Reference: Access File Attributes for a Delimited Sequential File
DELIMITER = delimiter, [ENCLOSURE = character,] [HEADER = {YES|NO}],$

where:

delimiter

Is the delimiter sequence consisting of up to 30 printable or non-printable non-null characters. For a non-printable character, enter the hexadecimal value that represents the character. If you use a mixture of printable and non-printable characters, you must enter them all as hexadecimal values. To create a tab-delimited file, you can specify the delimiter value as TAB or as its hexadecimal equivalent (0x09 on ASCII platforms or 0x05 on EBCDIC platforms).

Note that numeric digits and symbols used in numbers, such as a period (.), plus sign (+), or minus sign (-) cannot be used in the delimiter sequence.

character

Is the enclosure sequence. It can be up to four printable characters used to enclose each alphanumeric value in the file. Most alphanumeric characters can be used as all or part of the enclosure sequence. However, numeric digits and symbols used in numbers, such as a period (.), plus sign (+), or minus sign (-) cannot be used in the enclosure sequence. Also note that, in order to specify a single quotation mark as the enclosure character, you must enter four consecutive single quotation marks. The most common enclosure is one double quotation mark.

HEADER = {YES|NO}

Specifies whether to include a header record that contains the names of the fields in the delimited sequential file generated by the request. NO is the default value.



Example: Master and Access File for a Pipe Delimited File

The pipe delimited file named PIPE1 contains the following data in which each data value is delimited by a pipe character (|). Note that you can create a delimited file as output from a request using the HOLD FORMAT DFIX command in a request:

EAST|2000|3907|1145655.77    
EAST|2001|495922|127004359.88
EAST|2002|543678|137470917.05
NORTH|2001|337168|85750735.54
NORTH|2002|370031|92609802.80
SOUTH|2000|3141|852550.45    
SOUTH|2001|393155|99822662.88
SOUTH|2002|431575|107858412.0
WEST|2001|155252|39167974.18 
WEST|2002|170421|42339953.45 

The PIPE1 Master File is:

FILENAME=PIPE1   , SUFFIX=DFIX    , $                           
  SEGMENT=PIPE1, SEGTYPE=S2, $                                  
    FIELDNAME=REGION, ALIAS=E01, USAGE=A5, ACTUAL=A05, $        
    FIELDNAME=YEAR, ALIAS=E02, USAGE=YY, ACTUAL=A04, $
    FIELDNAME=QUANTITY, ALIAS=E03, USAGE=I8C, ACTUAL=A08, $     
    FIELDNAME=LINEPRICE, ALIAS=E04, USAGE=D12.2MC, ACTUAL=A12, $

The PIPE1 Access File is:

SEGNAME=PIPE1, DELIMITER=|, HEADER=NO, $

In the following version of the PIPE1 delimited file, each alphanumeric value is enclosed in double quotation marks:

"EAST"|2000|3907|1145655.77     
"EAST"|2001|495922|127004359.88 
"EAST"|2002|543678|137470917.05 
"NORTH"|2001|337168|85750735.54 
"NORTH"|2002|370031|92609802.80 
"SOUTH"|2000|3141|852550.45     
"SOUTH"|2001|393155|99822662.88 
"SOUTH"|2002|431575|107858412.01
"WEST"|2001|155252|39167974.18  
"WEST"|2002|170421|42339953.45  

The Master File does not change, but the Access File now specifies the enclosure character:

SEGNAME=PIPE1, DELIMITER=|, ENCLOSURE=", $

In this version of the PIPE1 delimited file, the first record in the file specifies the name of each field, and each alphanumeric value is enclosed in double quotation marks:

"REGION"|"YEAR"|"QUANTITY"|"LINEPRICE"
"EAST"|2000|3907|1145655.77           
"EAST"|2001|495922|127004359.88       
"EAST"|2002|543678|137470917.05       
"NORTH"|2001|337168|85750735.54       
"NORTH"|2002|370031|92609802.80       
"SOUTH"|2000|3141|852550.45           
"SOUTH"|2001|393155|99822662.88       
"SOUTH"|2002|431575|107858412.01      
"WEST"|2001|155252|39167974.18        
"WEST"|2002|170421|42339953.45        

The Master File remains the same. The Access File now specifies that there is a header record in the data file:

SEGNAME=PIPE1, DELIMITER=|, ENCLOSURE=", HEADER=YES, $

Information Builders