More on character classes
We looked at some basic regular expressions that
included character classes: a "choice" of character to match placed
inside square brackets. For example, [Tt] will match against either
T or t. On this page we'll look at some more possibilities with
character classes.
Character ranges
A useful feature is that we can put a range of characters by
placing a hyphen between start and end character. For example,
to match any lower case letter, we can write:
Similarly, to match a digit, we can write:
We can combine single characters and ranges, and/or combine multiple ranges:
Expression | Meaning |
[a-zA-Z] | A lower or upper case letter in the range A-Z. |
[0-9A-F] | A hexadecimal digit (0-9 or A-F) |
[0-9A-Fa-f] | A hexadecimal digit, either upper or lower case. |
[ 0-9] | A space or digit. |
Negation
To say "not in the range...", we put a hat symbol ^
at the beginning of the character class expression. So for example, to say "not a digit",
we would write the following:
Intersection
An operation called intersection essentially means "in this class AND in this one". It is really useful when we combine an intersection with a negation to say "in this class BUT NOT in this one". The intersection uses two ampersands. Here is the syntax:
[0-9&&[^5]]
[a-z&&[^aeiouy]]
The first of these says a digit except 5; the second says
any lower case letter except those representing vowels.
Note that one ampersand on its own– &– simply represents that character.
Named character classes
Some 'shortcuts' exist for common character classes (such as [0-9]) in
the form of named character classes.
Next...
On the next page, we'll look at a special character class: the dot.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.