Search and replace with regular expressions (2)
Many simple search and replace operations can be performed using the
String.replaceAll() method. Sometimes, more
flexibility is required: for example, if not every instance of the expression
needs replacing, or if the replacement string is not fixed. In this case,
instaces of Matcher provide a find() method which we will look
at here. The idiom introduced here can also be used simply to find and
process instances of a pattern in a string, without necessarily appending
anything to another string.
The Matcher.find() method
Using Matcher.find() shares some similarity to Matcher.matches(). We first need to compile a Pattern representing our regular expression
and then from this construct a Matcher around the string that we want to
process.
But unlike when we use matches(), our expression is now the pattern that we
want to find as a portion of the string, rather than as the whole string.
And since the pattern can occur multiple times in the string being matched, we will
sit in a loop calling the find() method. The find() method
will return true as long as there's another match.
To perform the "replacement", as we go along, we actually build up a
new StringBuffer that will contain the new version of the
string with the replacements made. A couple of methods of the Matcher object
will help us with this.
So keeping with our example of removing HTML 'bold' tags, the code now looks like this:
public String removeBoldTags(CharSequence htmlString) {
Pattern patt = Pattern.compile("<b>([^<]*)</b>");
Matcher m = patt.matcher(htmlString);
StringBuffer sb = new StringBuffer(htmlString.length());
while (m.find()) {
String text = m.group(1);
// ... possibly process 'text' ...
m.appendReplacement(sb, Matcher.quoteReplacement(text));
}
m.appendTail(sb);
return sb.toString();
}
You'll notice that the parameter passed in is not specifically a String
but actually just any old CharSequence. The CharSequence
interface introduced in Java 1.4 is implemented by String and by a few
other classes (such as StringBuffer and CharBuffer) that can
hold a 'sequence of characters'. On the other hand, the appendX()
methods work only with StringBuffers– it would have been nice if
they'd worked with any old Appendable, but the latter interface did not
exist when the regular expressions API was added (in Java 1.4; Appendable
was added in Java 5).
Group 0
You may recall from our discussion of capturing groups
that there is always a group 0, which refers to the entire string
when using the matches() method. When using the find() method,
group 0 refers to the entire portion of the string found to match
the expression on the previous call to find().
Find without the replace
Of course, you can use Matcher.find() without actually
using the replace. This can be used, for example, if you just want to count or process
instances of a particular pattern within a string.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.