Regular Expressions

In this section:

The syntax for regular expressions in iWay DQC follows the rules for regular expressions in Java, described in Class Pattern documentation.

The following topics describe regular expression usage extensions in iWay DQC.


Top of page

x
@" Syntax (Single Escaping)

When writing regular expressions, take into consideration that the regular expression is manipulated as a Java string. In literal Java strings, the backslash is an escape character. The literal string \\ is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash. This regular expression as a Java string becomes \\\\.

To avoid the use of the double escaping, prefix the string in quotes with @. In that case, the string inside the @" and " is taken as a literal, and no characters are considered escape characters in the context of the Java string.

Example: To write the expression that substitutes all occurrences of characters ^ and ] with x in string "ab[^]" (which leads to the resulting string "ab[xx"), write:

substituteAll("[\\^\\]]","x","ab[^]")

Using the @" syntax, write:

substituteAll(@"[\^\]]","x","ab[^]")

Top of page

x
Capturing Groups

Matching regular expressions in the input is done by analyzing the input expression string (the string that results from applying the expression to the input). Sections of the input string (called capturing groups, enclosed in parentheses) are identified and marked for further use in creating the output. These capturing groups can be referenced by using back-reference (see the syntax that follows).

In the case of a match, the matched data from the input is sent to predefined output columns. Each output column has a substitution property, which is the value that is sent to the output. It can contain the back-references with the following syntax

The capturing groups might be used in the expression substituteAll or substituteMany, and in the step Regex Matching.

For example, to substitute all pairs of letter-digit couples with just the digit from the couple (that is, the input string "a1b2c3d4e5" results in the output "12345"), write:

substituteAll("([a-z])([0-9])","${2}","a1b2c3d4e5")

iWay Software