A regular expression (regex) is a language, for it has a syntax. It is used to describe a pattern among characters, therefore it may simplify the process to parse a string or a text. You may parse once rather than repeating the process many times.
Character Classes
The term refers to a set of characters that you can enclose within square brackets, for instance [sz]; it used to specify one of the set may be matched, or say it is a pattern that needs to be checked.
Class type:
Simple [agfd] match exactly one from a, g, f or d
Range [a-f0-7] match one from the range a to f (both included) or 0-7(both included)
Negation [^123k-m] matches exactly one character that is not 1 2 or from the range k to m (both included)
Predefined Character Classes
Java's regex engine supports predefined character classes for your convenience.
\d : a digit [0-9]
\D : a non-digit[^0-9]
\s : a white space [\t (tab), \n (new line), space, \x0B(end of line), \f(form feed), \r(carriage)]
\S: a non-white space [^\s]
\w: a word character: [a-zA-Z0-9]
\W: a nonword character: [^\w]
\t: tab
\n: a new line
. : wildcard matching any character.
Boundary Matcher
^: matching the beginning of a line. fx: ^dog$. it means a line contains a single word 'dog'
$: matching the end of a line.
\b: a word boundary fx: \bdog\b dog an exact word dog is matched
\B: a non-word boundary fx: \bdog\B within doggie dog is matched
X? : matching X 0 or 1 time
X+: matching X 1 or many times
X*: matching X 0 or many times
X{3}: matching exactly 3 times
X{1,3}: matching X 1 to 3 times
Logic
X|Y: logic OR; matching X or Y
XY: X pattern followed by the Pattern Y
(X): capturing as a group.
No comments:
Post a Comment