Friday, 22 January 2021

Regular expressions


A regular expression (regex) is a language, for it has a syntax.  It is used to describe a pattern among characters, therefore it may simplify the process to parse a string or a text. You may parse once rather than repeating the process many times.

Character Classes 
The term refers to a set of characters that you can enclose within square brackets, for instance [sz]; it used to specify one of the set may be matched, or say it is a pattern that needs to be checked.

Class type: 
Simple [agfd]  match exactly one from a, g, f or d
Range [a-f0-7] match one from the range a to f (both included) or 0-7(both included)
Negation [^123k-m] matches exactly one character that is not 1 2 or from the range k to m (both included)

Predefined Character Classes
Java's regex engine supports predefined character classes for your convenience.

\d : a digit [0-9]
\D : a non-digit[^0-9]
\s : a white space [\t (tab), \n (new line), space, \x0B(end of line), \f(form feed), \r(carriage)]
\S: a non-white space [^\s]
\w: a word character: [a-zA-Z0-9]
\W: a nonword character: [^\w]
\t: tab
\n: a new line
. : wildcard matching any character.

Boundary Matcher
^: matching the beginning of a line. fx: ^dog$. it means a line contains a single word 'dog'
$: matching the end of a line.
\b: a word boundary  fx: \bdog\b    dog  an exact word dog is matched
\B: a non-word boundary  fx: \bdog\B   within doggie dog is matched

Quantifier
X? : matching X  0 or 1 time
X+: matching X 1 or many times
X*: matching X 0 or many times
X{3}: matching exactly 3 times
X{1,3}: matching X 1 to 3 times

Logic
X|Y: logic OR; matching X or Y
XY: X pattern followed by the Pattern Y
(X): capturing as a group. 


No comments:

Can Jackson Deserialize Java Time ZonedDateTime

Yes, but must include JSR310. Thus ZonedDateTime can be deserialized directly from JSON response to POJO field. <dependency> <g...