pregex.core.pre

This module a single class, namely Pregex, which constitutes the base class for every other class within pregex.

Classes & methods

Below are listed all classes within pregex.core.pre along with any possible methods they may possess.

class pregex.core.pre.Pregex(pattern: str = '', escape: bool = True)[source]

Wraps the provided pattern within an instance of this class.

Parameters
  • pattern (str) – The pattern that is to be wrapped within an instance of this class. Defaults to the empty string ''.

  • escape (bool) – Determines whether to escape the provided pattern or not. Defaults to True.

Raises

InvalidArgumentTypeException – Parameter pattern is not a string.

Note

This class constitutes the base class for every other class within the pregex package.

at_least(n: int, is_greedy: bool = True) Pregex[source]

Applies quantifier {n,} to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters
  • n (int) – The minimum number of times that the pattern is to be matched.

  • is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.`

Raises
  • InvalidArgumentTypeException – Parameter n is not an integer.

  • InvalidArgumentValueException – Parameter n has a value of less than zero.

  • CannotBeRepeatedException – This instance represents a non-repeatable pattern.

at_least_at_most(n: int, m: Optional[int], is_greedy: bool = True) Pregex[source]

Applies quantifier {n,m} to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters
  • n (int) – The minimum number of times that the pattern is to be matched.

  • m (int) – The minimum number of times that the pattern is to be matched.

  • is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.`

Raises
  • InvalidArgumentTypeException

    • Parameter pre is neither a Pregex instance nor a string.

    • Parameter n is not an integer.

    • Parameter m is neither an integer nor None.

  • InvalidArgumentValueException

    • Either parameter n or m has a value of less than zero.

    • Parameter n has a greater value than that of parameter m.

  • CannotBeRepeatedException – Parameter m has a value of greater than one, while this instance represents a non-repeatable pattern.

Note
  • Parameter is_greedy has no effect in the case that n equals m.

  • Setting m equal to None indicates that there is no upper limit to the number of times the pattern is to be repeated.

at_most(n: Optional[int], is_greedy: bool = True) Pregex[source]

Applies quantifier {,n} to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters
  • n (int) – The maximum number of times that the pattern is to be matched.

  • is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.

Raises
  • InvalidArgumentTypeException – Parameter n is neither an integer nor None.

  • InvalidArgumentValueException – Parameter n has a value of less than zero.

  • CannotBeRepeatedException – Parameter n has a value of greater than one, while this instance represents a non-repeatable pattern.

Note

Setting n equal to None indicates that there is no upper limit to the number of times the pattern is to be repeated.

capture(name: Optional[str] = None) Pregex[source]

Creates a capturing group out of this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters
  • pre (Pregex | str) – The pattern out of which the capturing group is created.

  • name (str) – The name that is assigned to the captured group for backreference purposes. A value of None indicates that no name is to be assigned to the group. Defaults to None.

Raises
  • InvalidArgumentTypeException – Parameter name is neither a string nor None.

  • InvalidCapturingGroupNameException – Parameter name is not a valid capturing group name. Such name must contain word characters only and start with a non-digit character.

Note
  • Creating a capturing group out of a capturing group does nothing.

  • Creating a capturing group out of a non-capturing group converts it into a capturing group, except if any flags have been applied to it, in which case, the non-capturing group is wrapped within a capturing group as a whole.

  • Creating a named capturing group out of an unnamed capturing group, assigns a name to it.

  • Creating a named capturing group out of a named capturing group, changes the group’s name.

compile() None[source]

Compiles the underlying RegEx pattern. After invoking this method, any further attempt at matching a string will be making use of the compiled RegEx pattern.

concat(pre: Union[Pregex, str], on_right: bool = True) Pregex[source]

Concatenates the provided pattern to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters
  • pre (Pregex | str) – Either a string or a Pregex instance representing the pattern that is to take part in the concatenation.

  • on_right (bool) – If True, then places the provided pattern on the right side of the concatenation, else on the left. Defaults to True.

Raises

InvalidArgumentTypeException – Parameter pre is neither a Pregex instance nor a string.

either(pre: Union[Pregex, str], on_right: bool = True) Pregex[source]

Applies the alternation operator | between the provided pattern and this instance’s underlying pattern, and returns the resulting pattern as a Pregex instance.

Parameters
  • pre (Pregex | str) – Either a string or a Pregex instance representing the pattern that is to take part in the alternation.

  • on_right (bool) – If True, then places the provided pattern on the right side of the alternation, else on the left. Defaults to True.

Raises

InvalidArgumentTypeException – Parameter pre is neither a Pregex instance nor a string.

enclose(pre: Union[Pregex, str]) Pregex[source]

Concatenates the provided pattern to both sides of this instance’s underlying pattern, and returns the resulting pattern as a Pregex instance.

Parameters

pre (Pregex | str) – Either a string or a Pregex instance representing the “enclosing” pattern.

Raises

InvalidArgumentTypeException – Parameter pre is neither a Pregex instance nor a string.

enclosed_by(pre: Union[Pregex, str]) Pregex[source]

Applies both positive lookahead assertion (?=<PRE>) and positive lookbehind assertion (?<=<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (str | Pregex) – A Pregex instance or string representing the “assertion” pattern.

Raises
  • InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

  • NonFixedWidthPatternException – A non-fixed-width pattern is provided in place of parameter assertion.

Note

The resulting pattern cannot have a repeating quantifier applied to it.

exactly(n: int) Pregex[source]

Applies quantifier {n} to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters

n (int) – The exact number of times that the patterns is to be matched.

Raises
  • InvalidArgumentTypeException – Parameter n is not an integer.

  • InvalidArgumentValueException – Parameter n has a value of less than zero.

  • CannotBeRepeatedException – Parameter n has a value of greater than one, while this instance represents a non-repeatable pattern.

followed_by(pre: Union[Pregex, str]) Pregex[source]

Applies positive lookahead assertion (?=<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (str | Pregex) – A Pregex instance or string representing the “assertion” pattern.

Raises

InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

Note

The resulting pattern cannot have a repeating quantifier applied to it.

get_captures(source: str, include_empty: bool = True, is_path: bool = False) list[tuple[str]][source]

Returns a list of tuples, one tuple per match, where each tuple contains all of its corresponding match’s captured groups.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding value will be None.

get_captures_and_pos(source: str, include_empty: bool = True, relative_to_match: bool = False, is_path: bool = False) list[list[tuple[str, int, int]]][source]

Returns a list containing lists of tuples, one list per match, where each tuple contains one of its corresponding match’s captured groups along with its exact position within the text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • relative_to_match (bool) – If True, then each group’s position-indices are calculated relative to the group’s corresponding match, not to the whole string. Defaults to False.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding tuple will be (None, -1, -1).

get_compiled_pattern(discard_after: bool = True) Pattern[source]

Returns this instance’s underlying RegEx pattern as a re.Pattern instance.

Parameters

discard_after (bool) – Determines whether the compiled pattern is to be discarded after the program has exited from this method, or to be retained so that any further attempt at matching a string will use the compiled pattern instead of the regular one. Defaults to True.

get_matches(source: str, is_path: bool = False) list[str][source]

Returns a list containing any possible matches found within the provided text.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

get_matches_and_pos(source: str, is_path: bool = False) list[tuple[str, int, int]][source]

Returns a list containing any possible matches found within the provided text along with their exact position.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

get_matches_with_context(source: str, n_left: int = 5, n_right: int = 5, is_path: bool = False) list[str][source]

Returns a list containing any possible matches found within the provided text, along with any of its surrounding context, the exact length of which can be configured through this method’s parameters.

Parameters
  • source (str) – The text that is to be examined.

  • n_left (int) – The number of characters representing the context on the left side of the match. Defaults to 5.

  • n_right (int) – The number of characters representing the context on the right side of the match. Defaults to 5.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Raises
  • InvalidArgumentTypeException – Either parameter n_left or n_right is not an integer.

  • InvalidArgumentValueException – Either parameter n_left or n_right has a value of less than zero.

get_named_captures(source: str, include_empty: bool = True, is_path: bool = False) list[dict[str, str]][source]

Returns a dictionary of tuples, one dictionary per match, where each dictionary contains key-value pairs of any named captured groups that belong to its corresponding match, with each key being the name of the captured group, whereas its corresponding value will be the actual captured text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding key-value pair will be name --> None.

get_named_captures_and_pos(source: str, include_empty: bool = True, relative_to_match: bool = False, is_path: bool = False) list[dict[str, tuple[str, int, int]]][source]

Returns a dictionary of tuples, one dictionary per match, where each dictionary contains key-value pairs of any named captured groups that belong to its corresponding match, with each key being the name of the captured group, whereas its corresponding value will be a tuple containing the actual captured group along with its exact position within the text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • relative_to_match (bool) – If True, then each group’s position-indices are calculated relative to the group’s corresponding match, not to the whole string. Defaults to False.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding key-value pair will be name --> (None, -1, -1).

get_pattern(include_flags: bool = False) str[source]

Returns this instance’s underlying RegEx pattern as a string.

Parameters

include_flags (bool) – Determines whether to display the used RegEx flags along with the pattern. Defaults to False.

Note

This method is to be preferred over str() when one needs to display this instance’s underlying Regex pattern.

group(is_case_insensitive: bool = False) Pregex[source]

Creates a non-capturing group out of this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters

is_case_insensitive (bool) – If True, then the “case insensitive” flag is applied to the group so that the pattern within it ignores case when it comes to matching. Defaults to False.

Raises

InvalidArgumentTypeException – Parameter pre is neither a Pregex instance nor a string.

Note
  • Creating a non-capturing group out of a non-capturing group does nothing, except for reset its flags, e.g. is_case_insensitive, if it has any.

  • Creating a non-capturing group out of a capturing group converts it into a non-capturing group.

has_match(source: str, is_path: bool = False) bool[source]

Returns True if at least one match is found within the provided text.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

indefinite(is_greedy: bool = True) Pregex[source]

Applies quantifier * to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters

is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.

Raises

CannotBeRepeatedException – This instance represents a non-repeatable pattern.

is_exact_match(source: str, is_path: bool = False) bool[source]

Returns True only if the provided text matches this pattern exactly.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

iterate_captures(source: str, include_empty: bool = True, is_path: bool = False) Iterator[tuple[str]][source]

Generates tuples, one tuple per match, where each tuple contains all of its corresponding match’s captured groups.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding value will be None.

iterate_captures_and_pos(source: str, include_empty: bool = True, relative_to_match: bool = False, is_path: bool = False) Iterator[list[tuple[str, int, int]]][source]

Generates lists of tuples, one list per match, where each tuple contains one of its corresponding match’s captured groups along with its exact position within the text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • relative_to_match (bool) – If True, then each group’s position-indices are calculated relative to the group’s corresponding match, not to the whole string. Defaults to False.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding tuple will be (None, -1, -1).

iterate_matches(source: str, is_path: bool = False) Iterator[str][source]

Generates any possible matches found within the provided text.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

iterate_matches_and_pos(source: str, is_path: bool = False) Iterator[tuple[str, int, int]][source]

Generates any possible matches found within the provided text along with their exact position.

Parameters
  • source (str) – The text that is to be examined.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

iterate_matches_with_context(source: str, n_left: int = 5, n_right: int = 5, is_path: bool = False) Iterator[str][source]

Generates any possible matches found within the provided text, along with any of its surrounding context, the exact length of which can be configured through this method’s parameters.

Parameters
  • source (str) – The text that is to be examined.

  • n_left (int) – The number of characters representing the context on the left side of the match. Defaults to 5.

  • n_right (int) – The number of characters representing the context on the right side of the match. Defaults to 5.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Raises
  • InvalidArgumentTypeException – Either parameter n_left or n_right is not an integer.

  • InvalidArgumentValueException – Either parameter n_left or n_right has a value of less than zero.

iterate_named_captures(source: str, include_empty: bool = True, is_path: bool = False) Iterator[dict[str, str]][source]

Generates dictionaries, one dictionary per match, where each dictionary contains key-value pairs of any named captured groups that belong to its corresponding match, with each key being the name of the captured group, whereas its corresponding value will be the actual captured text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding key-value pair will be name --> None.

iterate_named_captures_and_pos(source: str, include_empty: bool = True, relative_to_match: bool = False, is_path: bool = False) Iterator[dict[str, tuple[str, int, int]]][source]

Generates dictionaries, one dictionary per match, where each dictionary contains key-value pairs of any named captured groups that belong to its corresponding match, with each key being the name of the captured group, whereas its corresponding value will be a tuple containing the actual captured group along with its exact position within the text.

Parameters
  • source (str) – The text that is to be examined.

  • include_empty (bool) – Determines whether to include empty captures into the results. Defaults to True.

  • relative_to_match (bool) – If True, then each group’s position-indices are calculated relative to the group’s corresponding match, not to the whole string. Defaults to False.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Note

In case there exists an optional capturing group within the pattern, that has not been captured by a match, then that capture’s corresponding key-value pair will be name --> (None, -1, -1).

match_at_end() Pregex[source]

Applies assertion \Z to this instance’s underlying pattern so that it only matches if it is found at the end of a string, and returns the resulting pattern as a Pregex instance.

Note

The resulting pattern cannot have a repeating quantifier applied to it.

match_at_line_end() Pregex[source]

Applies assertion $ to this instance’s underlying pattern so that it only matches if it is found at the end of a line, and returns the resulting pattern as a Pregex instance.

Note
  • The resulting pattern cannot have a repeating quantifier applied to it.

  • Uses meta character $ since the MULTILINE flag is considered on.

match_at_line_start() Pregex[source]

Applies assertion ^ to this instance’s underlying pattern so that it only matches if it is found at the start of a line, and returns the resulting pattern as a Pregex instance.

Note
  • The resulting pattern cannot have a repeating quantifier applied to it.

  • Uses meta character ^ since the MULTILINE flag is considered on.

match_at_start() Pregex[source]

Applies assertion \A to this instance’s underlying pattern so that it only matches if it is found at the start of a string, and returns the resulting pattern as a Pregex instance.

Note

The resulting pattern cannot have a repeating quantifier applied to it.

not_enclosed_by(pre: Union[Pregex, str]) Pregex[source]

Applies both negative lookahead assertion (?=<PRE>)` and negative lookbehind assertion (?<!<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (Pregex | str) – Either a string or a Pregex instance representing the “assertion” pattern.

Raises
  • InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

  • EmptyNegativeAssertionException – The provided assertion pattern is the empty-string pattern.

  • NonFixedWidthPatternException – The provided assertion pattern does not have a fixed width.

not_followed_by(pre: Union[Pregex, str]) Pregex[source]

Applies negative lookahead assertion (?!<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (Pregex | str) – Either a string or a Pregex instance representing the “assertion” pattern.

Raises
  • InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

  • EmptyNegativeAssertionException – The provided assertion pattern is the empty-string pattern.

not_preceded_by(pre: Union[Pregex, str]) Pregex[source]

Applies negative lookbehind assertion (?<!<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (Pregex | str) – Either a string or a Pregex instance representing the “assertion” pattern.

Raises
  • InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

  • EmptyNegativeAssertionException – The provided assertion pattern is the empty-string pattern.

  • NonFixedWidthPatternException – The provided assertion pattern does not have a fixed width.

one_or_more(is_greedy: bool = True) Pregex[source]

Applies quantifier + to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters

is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.

Raises

CannotBeRepeatedException – This instance represents a non-repeatable pattern.

optional(is_greedy: bool = True) Pregex[source]

Applies quantifier ? to this instance’s underlying pattern and returns the result as a Pregex instance.

Parameters

is_greedy (bool) – Determines whether to declare this quantifier as greedy. When declared as such, the regex engine will try to match the expression as many times as possible. Defaults to True.

preceded_by(pre: Union[Pregex, str]) Pregex[source]

Applies positive lookbehind assertion (?<=<PRE>), where <PRE> corresponds to the provided pattern, to this instance’s underlying pattern and returns the resulting pattern as a Pregex instance.

Parameters

pre (str | Pregex) – A Pregex instance or string representing the “assertion” pattern.

Raises
  • InvalidArgumentTypeException – The provided argument is neither a Pregex instance nor a string.

  • NonFixedWidthPatternException – A non-fixed-width pattern is provided in place of parameter assertion.

Note

The resulting pattern cannot have a repeating quantifier applied to it.

print_pattern(include_flags: bool = False) None[source]

Prints this instance’s underlying RegEx pattern.

Parameters

include_flags (bool) – Determines whether to display the used RegEx flags along with the pattern. Defaults to False.

static purge() None[source]

Clears the regular expression caches.

replace(source: str, repl: str, count: int = 0, is_path: bool = False) str[source]

Replaces all or some of the occuring matches with repl and returns the resulting string. If there are no matches, then this method will return the provided text without modifying it.

Parameters
  • source (str) – The text that is to be matched and modified.

  • repl (str) – The string that is to replace any matches.

  • count (int) – The number of matches that are to be replaced, starting from left to right. A value of 0 indicates that all matches must be replaced. Defaults to 0.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

Raises

InvalidArgumentValueException – Parameter count has a value of less than zero.

split_by_capture(source: str, include_empty: bool = True, is_path: bool = False) list[str][source]

Splits the provided text based on any occuring captures and returns the result as alist containing each individual part of the text after the split.

Parameters
  • source (str) – The piece of text that is to be matched and split.

  • include_empty (bool) – Determines whether to include empty groups into the results. Defaults to True.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.

split_by_match(source: str, is_path: bool = False) list[str][source]

Splits the provided text based on any occuring matches and returns the result as a list containing each individual part of the text after the split.

Parameters
  • source (str) – The text that is to be matched and split.

  • is_path (bool) – If set to True, then parameter source is considered to be a local path pointing to the file from which the text is to be read. Defaults to False.