API Reference

Select

class squint.Select(objs=None, *args, **kwds)

A class to quickly load and select tabular data. The given objs, *args, and **kwds, can be any values supported by get_reader(). Additionally, objs can be a list of supported objects or a string with shell-style wildcards. If objs is already a reader-like object, it will be used as is.

Load a single file:

select = datatest.Select('myfile.csv')

Load a reader-like iterable:

select = datatest.Select([
    ['A', 'B'],
    ['x', 100],
    ['y', 200],
    ['z', 300],
])

Load multiple files:

select = datatest.Select(['myfile1.csv', 'myfile2.csv'])

Load multple files using a shell-style wildcard:

select = datatest.Select('*.csv')

When multiple sources are loaded into a single Select, data is aligned by fieldname and missing fields receive empty strings:

Data can be loaded from multiple files.
load_data(objs, *args, **kwds)

Load data from one or more objects into the Select. The given objs, *args, and **kwds, can be any values supported by the Select class initialization.

Load a single file into an empty Select:

select = datatest.Select()  # <- Empty Select.
select.load_data('myfile.csv')

Add a single file to an already-populated Select:

select = datatest.Select('myfile1.csv')
select.load_data('myfile2.xlsx', worksheet='Sheet2')

Add multiple files to an already-populated Select:

select = datatest.Select('myfile1.csv')
select.load_data(['myfile2.csv', 'myfile3.csv'])
fieldnames

A list of field names used by the data source.

__call__(columns=None, **where)

After a Select has been created, it can be called like a function to select fields and return an associated Query object.

The columns argument serves as a template to define the values and data types selected. All columns selections will be wrapped in an outer container. When a container is unspecified, a list is used as the default:

select = datatest.Select('example.csv')
query = select('A')  # <- selects a list of values from 'A'

When columns specifies an outer container, it must hold only one field—if a given container holds multiple fields, it is assumed to be an inner container (which gets wrapped in the default outer container):

query = select(('A', 'B'))  # <- selects a list of tuple
                            #    values from 'A' and 'B'

When columns is a dict, values are grouped by key:

query = select({'A': 'B'})  # <- selects a dict with
                            #    keys from 'A' and
                            #    values from 'B'

When columns is omitted, the object’s fieldnames are used instead.

Optional where keywords can narrow the selected data to matching rows. A key must specify the field to check and a value must be a predicate object (see Predicate for details). Rows where the predicate is a match are selected and rows where it doesn’t match are excluded:

select = datatest.Select('example.csv')
query = select({'A'}, B='foo')  # <- selects only the rows
                                #    where 'B' equals 'foo'

See the Making Selections tutorial for step-by-step examples.

create_index(*columns)

Create an index for specified columns—can speed up testing in many cases.

If you repeatedly use the same few columns to group or filter results, then you can often improve performance by adding an index for these columns:

select.create_index('town')

Using two or more columns creates a multi-column index:

select.create_index('town', 'postal_code')

Calling the function multiple times will create multiple indexes:

select.create_index('town')
select.create_index('postal_code')

Note

Indexes should be added with discretion to tune a test suite’s over-all performance. Creating several indexes before testing even begins could lead to longer run times so use indexes with care.

Query

class squint.Query(columns, **where)
class squint.Query(select, columns, **where)

A class to query data from a source object. Queries can be created, modified, and passed around without actually computing the result—computation doesn’t occur until the query object itself or its fetch() method is called.

The given columns and where arguments can be any values supported by Select.__call__().

Although Query objects are usually created by calling an existing Select, it’s possible to create them independent of any single data source:

query = Query('A')
classmethod from_object(obj)

Creates a query and associates it with the given object.

mylist = [1, 2, 3, 4]
query = Query.from_object(mylist)

If obj is a Query itself, a copy of the original query is created.

AGGREGATE METHODS

Aggregate methods operate on a collection of elements and produce a single result.

sum()

Get the sum of non-None elements.

avg()

Get the average of non-None elements. Strings and other objects that do not look like numbers are interpreted as 0.

min()

Get the minimum value from elements.

max()

Get the maximum value from elements.

count()

Get the count of non-None elements.

FUNCTIONAL METHODS

Functional methods take a user-provided function and use it to perform a specified procedure.

apply(function)

Apply function to entire group keeping the resulting data. If element is not iterable, it will be wrapped as a single-item list.

map(function)

Apply function to each element, keeping the results. If the group of data is a set type, it will be converted to a list (as the results may not be distinct or hashable).

filter(predicate=True)

Filter elements, keeping only those values that match the given predicate. When predicate is True, this method keeps all elements for which bool returns True (see Predicate for details).

reduce(function, initializer_factory=None)

Reduce elements to a single value by applying a function of two arguments cumulatively to all elements from left to right. If the optional initializer_factory is present, it is called without arguments to provide a value that is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty. If initializer_factory is not given and sequence contains only one item, the first item is returned.

starmap(function)

DATA HANDLING METHODS

Data handling methods operate on a collection of elements by reshaping or otherwise reformatting the data.

distinct()

Filter elements, removing duplicate values.

flatten()

Flatten dictionary into list of tuple rows. If data is not a dictionary, the original values are returned unchanged.

unwrap()

Unwrap single-item sequences or sets.

DATA OUTPUT METHODS

Data output methods evaluate the query and return its results.

execute(source=None, optimize=True)

A Query can be executed to return a single value or an iterable Result appropriate for lazy evaluation:

query = source('A')
result = query.execute()  # <- Returns Result (iterator)

Setting optimize to False turns-off query optimization.

fetch()

Executes query and returns an eagerly evaluated result.

to_csv(file, fieldnames=None, **fmtparams)

Execute the query and write the results as a CSV file (dictionaries and other mappings will be seralized).

The given file can be a path or file-like object; fieldnames will be printed as a header row; and fmtparams can be any values supported by csv.writer().

When fieldnames are not provided, names from the query’s original columns argument will be used if the number of selected columns matches the number of resulting columns.

Result

class squint.Result(iterable, evaltype, closefunc=None)

A simple iterator that wraps the results of Query execution to facilitate lazy evaluation of the resulting data.

Although Result objects are usually constructed automatically, it’s possible to create them directly:

iterable = iter([...])
result = Result(iterable, evaltype=list)

Warning

When iterated over, the iterable must yield only those values necessary for constructing an object of the given evaltype and no more. For example, when the evaltype is a set, the iterable must not contain duplicate or unhashable values. When the evaltype is a dict or other mapping, the iterable must contain unique key-value pairs or a mapping.

evaltype

The type of instance returned by the fetch method.

fetch()

Evaluate the entire iterator and return its result:

result = Result(iter([...]), evaltype=set)
result_set = result.fetch()  # <- Returns a set of values.

When evaluating a dict or other mapping type, any values that are, themselves, Result objects will also be evaluated.

__wrapped__

The underlying iterator—useful when introspecting or rewrapping.

Predicate

Squint can use Predicate objects for narrowing and filtering selections.

class squint.Predicate(obj, name=None)

A Predicate is used like a function of one argument that returns True when applied to a matching value and False when applied to a non-matching value. The criteria for matching is determined by the obj type used to define the predicate:

obj type matches when
function the result of function(value) tests as True
type value is an instance of the type
re.compile(pattern) value matches the regular expression pattern
True value is truthy (bool(value) returns True)
False value is falsy (bool(value) returns False)
str or non-container value is equal to the object
set value is a member of the set
tuple of predicates tuple of values satisfies corresponding tuple of predicates—each according to their type
... (Ellipsis literal) (used as a wildcard, matches any value)

Example matches:

obj example value matches
def iseven(x):
    return x % 2 == 0
4 Yes
9 No
float
1.0 Yes
1 No
re.compile('[bc]ake')
'bake' Yes
'cake' Yes
'fake' No
True
'x' Yes
'' No
False
'' Yes
'x' No
'foo'
'foo' Yes
'bar' No
{'A', 'B'}
'A' Yes
'C' No
('A', float)
('A', 1.0) Yes
('A', 2) No
('A', ...)

Uses ellipsis wildcard.

('A', 'X') Yes
('A', 'Y') Yes
('B', 'X') No

Example code:

>>> pred = Predicate({'A', 'B'})
>>> pred('A')
True
>>> pred('C')
False

Predicate matching behavior can also be inverted with the inversion operator (~). Inverted Predicates return False when applied to a matching value and True when applied to a non-matching value:

>>> pred = ~Predicate({'A', 'B'})
>>> pred('A')
False
>>> pred('C')
True

If the name argument is given, a __name__ attribute is defined using the given value:

>>> pred = Predicate({'A', 'B'}, name='a_or_b')
>>> pred.__name__
'a_or_b'

If the name argument is omitted, the object will not have a __name__ attribute:

>>> pred = Predicate({'A', 'B'})
>>> pred.__name__
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    pred.__name__
AttributeError: 'Predicate' object has no attribute '__name__'