Web Page Templates Icons, Clipart, Logos

Blog

Hot Topics

Post Archive

Tags

Aug 04, 2009 01:03 AM EDT

Regular Expressions and negative lookahead assertion

I consider myself fairly good at regular expressions, which is a way to match specific patterns within a string of text; however, today, I learned a new little nugget of information regarding pattern matching.

We needed to match any filename that ended in html or php, but didn’t start with the word “form”. The regex syntax that I learned today was the Zero-width negative lookahead assertion which follows the form (?!pattern). This matches only if the subexpression denoted by the parenthesis does not match at this position on the right. Therefore (?!form) would match as long as form wasn’t at the specified position.

The whole regular expression was: ^(?!form).*\.(php|html)$

The first carot ^ denotes the start of the string, then our negative lookahead makes sure it doesn’t start with the word “form”. After that the dot star matches anything until it finds a period (the backslash or escape character makes it look for a period instead of it’s default “any character”. Then it looks for php or html (the pipe or | represents the boolean OR), and finally the $ indicates the end of the string. It does a lot in just a few characters.

Incidentally, (?=pattern) is a zero-width positive lookahead, which means that it must match that pattern at the position indicated in the overall regular expression. (?!pattern) is the zero-width negative lookahead. (?<=pattern) is a zero-width positive lookbehind assertion, and (? pattern) is a nonbacktracking subexpression or so called greedy subexpression. So far, I haven’t needed to make things specifically greedy, usually it’s the opposite, I’m forcing it to not be greedy. What that means is that it’ll match the largest part of the string that it can; For example, if you used the regex /*.\.txt/ and the string was some.txt.txt, a greedy expression would return “some.txt.txt”, while an ungreedy expression would return “some.txt”. It’s a minor detail that can really throw you off on occasion.

Darren regular expressions
Displaying 1 post

Online Information for Geeks

 

 

 

 

Resource Links