Subpackages

PRegEx’s modules are divided into two subpackages, namely pregex.core and pregex.meta, the former of which predominantly contains modules whose classes represent some fundamental RegEx operator, whereas the latter acts as a collection of various classes that build upon those within the core modules in order to provide ready-made patterns that can be used “straight out of the box”.

pregex.core

In order to better understand core modules, consider for example pregex.core.quantifiers, all classes of which correspond to a unique RegEx quantifier:

from pregex.core.quantifiers import *

Optional # Represents quantifier '?'
Indefinite # Represents quantifier '*'
OneOrMore # Represents quantifier '+'
Exactly # Represents quantifier '{n}'
AtLeast # Represents quantifier '{n,}'
AtMost # Represents quantifier '{,n}'
AtLeastAtMost # Represents quantifier '{n,m}'

However, not all core modules contain classes that represent some specific RegEx operator. There is the pregex.core.tokens module, whose classes act as wrappers for various single-character patterns. That is, either to protect you from any character-escape-related issues that may arise due to using raw strings containing backslashes, or to save you the trouble of looking for a specific symbol’s Unicode code point, provided of course that there is a corresponding Token class for that symbol.

from pregex.core.tokens import Newline, Copyright

# Both of these statements are 'True'.
Newline().is_exact_match('\n')
Copyright().is_exact_match('©')

Lastly, there is module pregex.core.classes which does not only offer a number of commonly used RegEx character classes, but a complete framework for working on these classes as if they were regular sets.

from pregex.core.classes import AnyLetter, AnyDigit

letter = AnyLetter() # Represents '[A-Za-z]'
digit_but_five = AnyDigit() - '5' # Represents '[0-46-9]'
letter_or_digit_but_five = letter | digit_but_five # Represents '[A-Za-z0-46-9]'
any_but_letter_or_digit_but_five = ~ letter_or_digit_but_five # Represents '[^A-Za-z0-46-9]'

Click on any one of pregex’s core modules below to check out its classes:

pregex.meta

Unlike core modules, whose classes are all independent from each other, meta modules contain classes that effectively combine various Pregex instances together in order to form complex patterns that you can then use. Consider for example Integer which enables you to match any integer within a specified range.

from pregex.meta.essentials import Integer

text = "1 5 11 23 77 117 512 789 1011"

pre = Integer(start=50, end=1000)

print(pre.get_matches(text)) # This prints "['77', '117', '512', '789']"

Classes in meta modules therefore offer various patterns that can be useful, but at the same time hard to build. And remember, no matter the complexity of a pattern, it remains to be a Pregex instance, and as such, it can always be extended even further!

from pregex.core.classes import AnyLetter
from pregex.meta.essentials import Integer

pre = AnyLetter() + Integer(start=50, end=1000, is_extensible=True)
text = "a1 b5 c11 d23 e77 f117 g512 h789 i1011"

print(pre.get_matches(text)) # This prints "['e77', 'f117', 'g512', 'h789']"

Just don’t forget to set parameter is_extensible to True, as this prevents some additional assertions from being applied to the pattern, which even though are essential in order for it to be able to match what is supposed to, at the same time they might introduce certain complications when it comes to the pattern serving as a building block to a larger pattern.

Click on any one of pregex’s meta modules below to check out its classes: