Today I learnt about how to use the \B
gadget in Python regular expressions. I've previously talked about the usefulness of \b
but there's a big benefit to using \B
sometimes too.
What \b
does is that it is a word-boundary for alphanumerics. It allows you to find "peter" in "peter bengtsson" but not "peter" in "nickname: peterbe". In other words, all the letters have to be grouped prefixed or suffixed by a wordboundry such as newline, start-of-line, end-of-line or a non alpha character like (
.
What \b
does for finding alphanumerics, \B
does for finding non-alphanumerics. Example:
>>> import re
>>> re.compile(r'\bX\b').findall('X + Y')
['X'] # it can find 'X'
>>> re.compile(r'\b\+\b').findall('X + Y')
[] # same technique can't find '+'
>>> re.compile(r'\B\+\B').findall('X + Y')
['+'] # better to use \B when finding '+'
>>> re.compile(r'\BX\B').findall('X + Y')
[] # and use \B only for non-alphanumerics
The lesson is: \b
is a really useful tool but it's limited to finding alphanumerics (numbers and A-Z). \B
is what you have to use for finding non-alphanumerics.
Comments