The m Operator (part 2)


If there are examples missing from these pages which you think should be included, please use this mailer to send them to me so that I can see if the current implementation can be improved (thanks in advance! c.w.d.).

Searching for Zero, One or More Occurrences There are three repetition characters (one of which we used already, the plus sign) which need discussion, and now is as good a time as any to do that:
operatormeaning
+one or more of the preceding character
*zero or more of the preceding character
?zero or one of the preceding character
searched stringsearch patternExplanation
This is an expression.ressionmatches the characters `ression'
This is an expression.res?matches the characters `ress'
This is an expression.res?idoesn't match the characters `ressi'
This is an expression.ress?imatches the characters `ressi'
This is an expresion.ress?i matches the characters `resi'
This is an expression.res*imatches the characters `ressi'
This is an expression.res+imatches the characters `ressi'
This is an expressssssion.res+imatches the characters `ressssssi'
This is an expressssssion.res*imatches the characters `ressssssi'
Here, the res? and res?i make one think. The `s?' is a single character, which is either zero or one occurence of the letter `s'.
It is important, in using these, to note clearly the `one or more', `zero or more', and `zero or one' selectivity being afforded to us. But, the parsing of this remains difficult, once one has a search target in mind, and great care is required not to get fallacious matches, and fraudulent `no match' situations.
searched stringsearch patternExplanation
expressssssion.s{6}matches the characters `ssssss'
expressssssion.s{7}doesn't match the characters `ssssss' (there are six of them)
expressssssion.s{7,}doesn't match the characters `ssssss' (there are six of them)
expressssssion.s{6,}matches the characters `ssssss' (there are six of them), the call asks for at least 6
expressssssion.s{5,}matches the characters `ssssss' (there are six of them), the call asks for at least 5
expressssssion.s{2,7}matches the characters `ssssss' (there are six of them), the call asks for at least 2 and not more than 7
NOTE: Appending a `?' to the end of any of these, i.e. {2,7}?, results in `non-greedy' matching under multiple calls, which we can not emulate here.

Searching for Specially Defined Character Types We can search (as seen in the previous page) for word-like, alphanumeric, characters using \w, and for whitespace by using \s, and now we introduce searching for digits using \d.
searched stringsearch patternExplanation
The number is 2.71828 \d.\dmatches the characters `2.7'
The number is 2.71828 \d\d\dmatches the characters `718'
The number is 2.71828 \d{5}matches the characters `71828'
The number is 2.71828 \d{6}doesn't match the characters `71828'
The number is 2.71828 \s\d matches the characters ` 2', NOTE the whitespace!

Searching for "Negated or Inverted or Reversed" Character Types
searched stringsearch patternExplanation
The number is 2.71828 \S\d.\ddoesn't match the characters ` 2.7' but matches 2.71 since \S means non-whitespace!
The number is 2.71828 \S\d\.\ddoesn't match the characters ` 2.7'
The number is 2.71828 \w\s\d\.\dmatches the characters `s 2.7'
The number is 2.71828 \w\W\d\.\dmatches the characters `s 2.7'
The number is 2.71828 \D\W\d\.\dmatches the characters `s 2.7'
Here, \W means a non-alphanumeric character (here whitespace), equivalent to \s. \S means a non-whitespace. \D means a non-digit.

Footnote greedy The quantifiers *, +, ?, {n}, {n,} and {n,m} are greedy in the sense that they will match as many times as possible. To match a minimum number of times, append to the quantifier a `?'. This is called `changing the gravity' of the match.
Back to Table of Contents
Back Beginning Regular Expressions
Forward to Even More Complicated m expressions.