parser._parser

This module offers a generic date/time string parser which is able to parse most known formats to represent a date and/or time.

This module attempts to be forgiving with regards to unlikely input formats, returning a datetime object even for dates which are ambiguous. If an element of a date/time stamp is omitted, the following rules are applied:

  • If AM or PM is left unspecified, a 24-hour clock is assumed, however, an hour on a 12-hour clock (0 <= hour <= 12) must be specified if AM or PM is specified.
  • If a time zone is omitted, a timezone-naive datetime is returned.

If any other elements are missing, they are taken from the datetime.datetime object passed to the parameter default. If this results in a day number exceeding the valid number of days per month, the value falls back to the end of the month.

Additional resources about date/time string formats can be found below:

Module Contents

Classes

_timelex(self,instream)
_resultbase(self)
parserinfo(self,dayfirst=False,yearfirst=False) Class which handles what inputs are accepted. Subclass this to customize
_ymd(self,*args,**kwargs)
parser(self,info=None)
_tzparser()
UnknownTimezoneWarning() Raised when the parser finds a timezone it cannot parse into a tzinfo

Functions

parse(timestr,parserinfo=None,**kwargs) Parse a string in one of the supported formats, using the
_parsetz(tzstr)
class _timelex(instream)
__init__(instream)
get_token()

This function breaks the time string into lexical units (tokens), which can be parsed by the parser. Lexical units are demarcated by changes in the character set, so any continuous string of letters is considered one unit, any continuous string of numbers is considered one unit.

The main complication arises from the fact that dots (‘.’) can be used both as separators (e.g. “Sep.20.2009”) or decimal points (e.g. “4:30:21.447”). As such, it is necessary to read the full context of any dot-separated strings before breaking it into tokens; as such, this function maintains a “token stack”, for when the ambiguous context demands that multiple tokens be parsed at once.

__iter__()
__next__()
next()
split(s)
isword(nextchar)

Whether or not the next character is part of a word

isnum(nextchar)

Whether the next character is part of a number

isspace(nextchar)

Whether the next character is whitespace

class _resultbase
__init__()
_repr(classname)
__len__()
__repr__()
class parserinfo(dayfirst=False, yearfirst=False)

Class which handles what inputs are accepted. Subclass this to customize the language and acceptable values for each parameter.

Parameters:
  • dayfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the day (True) or month (False). If yearfirst is set to True, this distinguishes between YDM and YMD. Default is False.
  • yearfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year. If True, the first number is taken to be the year, otherwise the last number is taken to be the year. Default is False.
__init__(dayfirst=False, yearfirst=False)
_convert(lst)
jump(name)
weekday(name)
month(name)
hms(name)
ampm(name)
pertain(name)
utczone(name)
tzoffset(name)
convertyear(year, century_specified=False)

Converts two-digit years to year within [-50, 49] range of self._year (current local time)

validate(res)
class _ymd(*args, **kwargs)
__init__(*args, **kwargs)
has_year()
has_month()
has_day()
could_be_day(value)
append(val, label=None)
_resolve_from_stridxs(strids)

Try to resolve the identities of year/month/day elements using ystridx, mstridx, and dstridx, if enough of these are specified.

resolve_ymd(yearfirst, dayfirst)
class parser(info=None)
__init__(info=None)
parse(timestr, default=None, ignoretz=False, tzinfos=None, **kwargs)

Parse the date/time string into a datetime.datetime object.

Parameters:
  • timestr – Any date/time string using the supported formats.
  • default – The default datetime object, if this is a datetime object and not None, elements specified in timestr replace elements in the default object.
  • ignoretz – If set True, time zones in parsed strings are ignored and a naive datetime.datetime object is returned.
  • tzinfos

    Additional time zone names / aliases which may be present in the string. This argument maps time zone names (and optionally offsets from those time zones) to time zones. This parameter can be a dictionary with timezone aliases mapping time zone names to time zones or a function taking two parameters (tzname and tzoffset) and returning a time zone.

    The timezones to which the names are mapped can be an integer offset from UTC in seconds or a tzinfo object.

    This parameter is ignored if ignoretz is set.

  • \*\*kwargs – Keyword arguments as passed to _parse().
Returns:

Returns a datetime.datetime object or, if the fuzzy_with_tokens option is True, returns a tuple, the first element being a datetime.datetime object, the second a tuple containing the fuzzy tokens.

Raises:
  • ValueError – Raised for invalid or unknown string format, if the provided tzinfo is not in a valid format, or if an invalid date would be created.
  • TypeError – Raised for non-string or character stream input.
  • OverflowError – Raised if the parsed date exceeds the largest valid C integer on your system.
class _result
_parse(timestr, dayfirst=None, yearfirst=None, fuzzy=False, fuzzy_with_tokens=False)

Private method which performs the heavy lifting of parsing, called from parse(), which passes on its kwargs to this function.

Parameters:
  • timestr – The string to parse.
  • dayfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the day (True) or month (False). If yearfirst is set to True, this distinguishes between YDM and YMD. If set to None, this value is retrieved from the current parserinfo object (which itself defaults to False).
  • yearfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year. If True, the first number is taken to be the year, otherwise the last number is taken to be the year. If this is set to None, the value is retrieved from the current parserinfo object (which itself defaults to False).
  • fuzzy – Whether to allow fuzzy parsing, allowing for string like “Today is January 1, 2047 at 8:21:00AM”.
  • fuzzy_with_tokens – If True, fuzzy is automatically set to True, and the parser will return a tuple where the first element is the parsed datetime.datetime datetimestamp and the second element is a tuple containing the portions of the string which were ignored:
_parse_numeric_token(tokens, idx, info, ymd, res, fuzzy)
_find_hms_idx(idx, tokens, info, allow_jump)
_assign_hms(res, value_repr, hms)
_could_be_tzname(hour, tzname, tzoffset, token)
_ampm_valid(hour, ampm, fuzzy)

For fuzzy parsing, ‘a’ or ‘am’ (both valid English words) may erroneously trigger the AM/PM flag. Deal with that here.

_adjust_ampm(hour, ampm)
_parse_min_sec(value)
_parsems(value)

Parse a I[.F] seconds value into (seconds, microseconds).

_parse_hms(idx, tokens, info, hms_idx)
_recombine_skipped(tokens, skipped_idxs)
>>> tokens = ["foo", " ", "bar", " ", "19June2000", "baz"]
>>> skipped_idxs = [0, 1, 2, 5]
>>> _recombine_skipped(tokens, skipped_idxs)
["foo bar", "baz"]
_build_tzinfo(tzinfos, tzname, tzoffset)
_build_tzaware(naive, res, tzinfos)
_build_naive(res, default)
_assign_tzname(dt, tzname)
_to_decimal(val)
parse(timestr, parserinfo=None, **kwargs)

Parse a string in one of the supported formats, using the parserinfo parameters.

Parameters:
  • timestr – A string containing a date/time stamp.
  • parserinfo – A parserinfo object containing parameters for the parser. If None, the default arguments to the parserinfo constructor are used.

The **kwargs parameter takes the following keyword arguments:

Parameters:
  • default – The default datetime object, if this is a datetime object and not None, elements specified in timestr replace elements in the default object.
  • ignoretz – If set True, time zones in parsed strings are ignored and a naive datetime object is returned.
  • tzinfos

    Additional time zone names / aliases which may be present in the string. This argument maps time zone names (and optionally offsets from those time zones) to time zones. This parameter can be a dictionary with timezone aliases mapping time zone names to time zones or a function taking two parameters (tzname and tzoffset) and returning a time zone.

    The timezones to which the names are mapped can be an integer offset from UTC in seconds or a tzinfo object.

    This parameter is ignored if ignoretz is set.

  • dayfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the day (True) or month (False). If yearfirst is set to True, this distinguishes between YDM and YMD. If set to None, this value is retrieved from the current parserinfo object (which itself defaults to False).
  • yearfirst – Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year. If True, the first number is taken to be the year, otherwise the last number is taken to be the year. If this is set to None, the value is retrieved from the current parserinfo object (which itself defaults to False).
  • fuzzy – Whether to allow fuzzy parsing, allowing for string like “Today is January 1, 2047 at 8:21:00AM”.
  • fuzzy_with_tokens – If True, fuzzy is automatically set to True, and the parser will return a tuple where the first element is the parsed datetime.datetime datetimestamp and the second element is a tuple containing the portions of the string which were ignored:
Returns:

Returns a datetime.datetime object or, if the fuzzy_with_tokens option is True, returns a tuple, the first element being a datetime.datetime object, the second a tuple containing the fuzzy tokens.

Raises:
  • ValueError – Raised for invalid or unknown string format, if the provided tzinfo is not in a valid format, or if an invalid date would be created.
  • OverflowError – Raised if the parsed date exceeds the largest valid C integer on your system.
class _tzparser
class _result
class _attr
__repr__()
__init__()
parse(tzstr)
_parsetz(tzstr)
class UnknownTimezoneWarning

Raised when the parser finds a timezone it cannot parse into a tzinfo