Subpackages
PRegEx’s modules are divided into two subpackages, namely pregex.core
and
pregex.meta
, the former of which predominantly contains modules whose classes
represent some fundamental RegEx operator, whereas the latter acts as a collection
of various classes that build upon those within the core modules in order to provide
ready-made patterns that can be used “straight out of the box”.
pregex.core
In order to better understand core modules, consider for example
pregex.core.quantifiers
, all classes of which correspond
to a unique RegEx quantifier:
from pregex.core.quantifiers import *
Optional # Represents quantifier '?'
Indefinite # Represents quantifier '*'
OneOrMore # Represents quantifier '+'
Exactly # Represents quantifier '{n}'
AtLeast # Represents quantifier '{n,}'
AtMost # Represents quantifier '{,n}'
AtLeastAtMost # Represents quantifier '{n,m}'
However, not all core modules contain classes that represent some specific
RegEx operator. There is the pregex.core.tokens
module, whose
classes act as wrappers for various single-character patterns. That is, either
to protect you from any character-escape-related issues that may arise due
to using raw strings containing backslashes, or to save you the trouble of looking
for a specific symbol’s Unicode code point, provided of course that there is a
corresponding Token class for that symbol.
from pregex.core.tokens import Newline, Copyright
# Both of these statements are 'True'.
Newline().is_exact_match('\n')
Copyright().is_exact_match('©')
Lastly, there is module pregex.core.classes
which does not only
offer a number of commonly used RegEx character classes, but a complete
framework for working on these classes as if they were regular sets.
from pregex.core.classes import AnyLetter, AnyDigit
letter = AnyLetter() # Represents '[A-Za-z]'
digit_but_five = AnyDigit() - '5' # Represents '[0-46-9]'
letter_or_digit_but_five = letter | digit_but_five # Represents '[A-Za-z0-46-9]'
any_but_letter_or_digit_but_five = ~ letter_or_digit_but_five # Represents '[^A-Za-z0-46-9]'
Click on any one of pregex’s core modules below to check out its classes:
pregex.meta
Unlike core modules, whose classes are all independent from each other,
meta modules contain classes that effectively combine various
Pregex
instances together in order to form
complex patterns that you can then use. Consider for example
Integer
which enables you to
match any integer within a specified range.
from pregex.meta.essentials import Integer
text = "1 5 11 23 77 117 512 789 1011"
pre = Integer(start=50, end=1000)
print(pre.get_matches(text)) # This prints "['77', '117', '512', '789']"
Classes in meta modules therefore offer various patterns that can be useful, but at the same time hard to build. And remember, no matter the complexity of a pattern, it remains to be a Pregex instance, and as such, it can always be extended even further!
from pregex.core.classes import AnyLetter
from pregex.meta.essentials import Integer
pre = AnyLetter() + Integer(start=50, end=1000, is_extensible=True)
text = "a1 b5 c11 d23 e77 f117 g512 h789 i1011"
print(pre.get_matches(text)) # This prints "['e77', 'f117', 'g512', 'h789']"
Just don’t forget to set parameter is_extensible
to True
, as
this prevents some additional assertions from being applied to the
pattern, which even though are essential in order for it to be able
to match what is supposed to, at the same time they might introduce
certain complications when it comes to the pattern serving as a building
block to a larger pattern.
Click on any one of pregex’s meta modules below to check out its classes: