lineparser

Parser for fixed column formatted files.

Home /

API documentation /docs

View the Project on GitHub jkarns275/lineparser

lineparser.parse()

Attempts to parse the lines from filename using the field specfications supplied in pyfields

Parameters
  • pyfields (list of Field) – This list describes the fixed-width file format. The Fields in the list ought to be in the same order that they appear in the file.

  • filename (str or bytes) – The filename or path which points to the fixed-width formatted file. If filename is a str, it must be utf-8 encoded

Returns

A list of numpy arrays and lists, where the order matches that of the fields supplied in pyfields. For String fields it will be a list of str, and for Float64 and Int64 it will be a numpy array.

Return type

list of iterable

Raises
  • LineParsingError – If there is a bad line (wrong length), or a bad field (failed to parse)

  • OSError – If this function fails to open filename

  • FieldError – If there are zero fields provided, or if the provided fields are not all of type Field

  • MemoryError – If there is not enough memory to read the input file and allocate field containers.

Examples

>>> from lineparser import parse, Field
>>> fields = [Field(int, 3),
...           Field(int, 4),
...           Field(str, 6)]
>>> file = open("test.lines", "w")
>>> file.write(" 15 255   dog\n")
14
>>> file.write("146  12 horse\n")
14
>>> file.close()
>>> parse(fields, "test.lines")
[array([ 15, 146]),
 array([255,  12]),
 [b'   dog', b' horse']]
lineparser.parse()

Attempts to parse the lines from filename using the field specfications supplied in pyfields

Parameters
  • pyfields (list of Field) – This list describes the fixed-width file format. The Fields in the list ought to be in the same order that they appear in the file.

  • filename (str or bytes) – The filename or path which points to the fixed-width formatted file. If filename is a str, it must be utf-8 encoded

Returns

A list of numpy arrays and lists, where the order matches that of the fields supplied in pyfields. For String fields it will be a list of str, and for Float64 and Int64 it will be a numpy array.

Return type

list of iterable

Raises
  • LineParsingError – If there is a bad line (wrong length), or a bad field (failed to parse)

  • OSError – If this function fails to open filename

  • FieldError – If there are zero fields provided, or if the provided fields are not all of type Field

  • MemoryError – If there is not enough memory to read the input file and allocate field containers.

Examples

>>> from lineparser import parse, Field
>>> fields = [Field(int, 3),
...           Field(int, 4),
...           Field(str, 6)]
>>> file = open("test.lines", "w")
>>> file.write(" 15 255   dog\n")
14
>>> file.write("146  12 horse\n")
14
>>> file.close()
>>> parse(fields, "test.lines")
[array([ 15, 146]),
 array([255,  12]),
 [b'   dog', b' horse']]
lineparser.named_parse()

Attempts to parse the lines from filename using the field specfications supplied in named_fields. Then, a map is created where the keys are the supplied names, and the values are the results of parsing.

Parameters
  • named_fields (list of NamedField) – This list describes the fixed-width file format. The Fields in the list ought to be in the same order that they appear in the file.

  • filename (str or bytes) – The filename or path which points to the fixed-width formatted file. If filename is a str, it must be utf-8 encoded

Returns

A map from name to the result of parsing, where the parsing results are either a list or a numpy array. For String fields it will be a list of str, and for Float64 and Int64 it will be a numpy array.

Return type

list of iterable

Raises
  • LineParsingError – If there is a bad line (wrong length), or a bad field (failed to parse)

  • OSError – If this function fails to open filename

  • FieldError – If there are zero fields provided, or if the provided fields are not all of type Field

  • MemoryError – If there is not enough memory to read the input file and allocate field containers.

  • DuplicateFieldNameError – If more than two or more of the NamedFields in named_field have the same name.

Examples

>>> from lineparser import named_parse, NamedField
>>> fields = [NamedField("a", int, 3),
...           NamedField("b", int, 4),
...           NamedField("c", str, 6)]
>>> file = open("test.lines", "w")
>>> file.write(" 15 255   dog\n")
14
>>> file.write("146  12 horse\n")
14
>>> file.close()
>>> named_parse(fields, "test.lines")
{ 'a': array([ 15, 146]),
  'b': array([255,  12]),
  'c': [b'   dog', b' horse']
}
class lineparser.Ty

An enumeration for the valid field types.

  • Float types are real numbers.

  • Int types are signed integers.

  • The string type is a string.

  • The phantom type is … nothing. If there is a field in a file you don’t need, instead of

parsing it and wasting time and memory, use the Phantom type. This will completely ignore the fields contents.

When choosing a data type, it is important to ensure that the numbers you will be reading can fit into the data type. For example, Int8 can hold numbers from -128 to 127. If your field has numbers between -1000 and 5000, than Int8 is going to be the wrong data type. Int16, Int32, and Int64 would all be acceptable choices, but Int16 may be consided optimal since it would consume the least amount of ram.

For more information pertaining data type ranges / capacities, refer to the numpy data types documentation.

Bytes = 8
Float32 = 1
Float64 = 0
Int16 = 4
Int32 = 3
Int64 = 2
Int8 = 5
Phantom = 7
String = 6
class lineparser.Field

Creates a new Field, and verifies that the supplied ty and length parameters are of of the appropriate type.

Parameters
  • ty (Ty or int or type) – The field type. Must be an integer which corresponds to Ty. You can also use the the ‘int’, ‘float’, and ‘str’ type classes.

  • length (int) – The length of the field. This must be a positive integer.

Examples

>>> import lineparser
>>> lineparser.Field(str, 5)
Field(String, 5)
>>> lineparser.Field(int, 10)
Field(Int64, 10)
>>> lineparser.Field(float, 6)
Field(Float64, 6)
>>> lineparser.Field(lineparser.Float64, 14)
Field(Float64, 14)
class lineparser.NamedField

The NamedField class is just like the field class, except it has a name.

Parameters
  • name (str) – The name of the field. This name should be unique.

  • ty (Ty or int or type) – The type of the field. This can be an instance of the Ty enumeration, or one of the type literals ‘int’, ‘float’, and ‘str’.

  • length (int) – The length of the field. This must be a positive integer.

Examples

>>> from lineparser import NamedField
>>> NamedField("f1", float, 10)
NamedField('f1', Float64, 10)
>>> NamedField("f2", str, 4)
NamedField('f2', String, 4)
>>> NamedField("f3", int, 8)
NamedField('f3', Int64, 8)
>>> p = NamedField("p", str, 25)
>>> p.field
Field(String, 25)
>>> p.name
'p'
>>> p.field.len
25
>>> p.field.ty
2
exception lineparser.FieldError

A FieldError is raised when there is something wrong with a Field or NamedField. The error just contains a short description of what the problem was. Possible errors include:

  • Invalid field type.

  • Invalid field length (length must be > 0)

  • Passing an object that is not of type Field as the field lists

  • Having zero fields

exception lineparser.LineParsingError

This error occurs when something goes wrong during parsing. This includes the following:

  • A malformed line: a line is either too long or too short.

  • Premature end of file: the last line is shorter than expected.

  • Failure to parse a real number.

  • Failure to parse an integer.

Examples

>>> from lineparser import NamedField, named_parse, \
...                        LineParsingError
>>> fields = [NamedField("a", int, 3),
...           NamedField("b", int, 4),
...           NamedField("c", str, 6)]
>>> file = open("test.lines", "w")
>>> file.write(" 15 255 dog\n") #
12
>>> file.write("146  12 horse\n")
14
>>> file.close()
>>> try:
...     named_parse(fields, "test.lines")
... except LineParsingError as e:
...     print(f"Encountered error: {e}")
...
Encountered error: test.lines:2:0: Encountered a malformed line.
>>>