Building Queries

The following examples demonstrate squint’s Query class. This document builds on the Making Selections tutorial.

Get Started

We will get started the same way we did in the first tutorial. Begin by starting the Python interactive prompt in the same directory as the example.csv file. Once you are at the >>> prompt, import squint and load the data:

>>> import squint
>>> select = squint.Select('example.csv')

Creating a Query Object

In the Making Selections tutorial, we created several Query objects—each call to a Select object returns a Query.

By selecting a list of elements from column C, we get a Query object in return:

>>> select('C')
Query(<squint.Select object at 0x7ffa625b>, ['C'])
---- preview ----
['20', '30', '10', '20', '10', '10']

We can also create Queries directly using the following syntax (although it’s rarely necessary to do so):

>>> squint.Query(select, 'C')
Query(<squint.Select object at 0x7ffa625b>, ['C'])
---- preview ----
['20', '30', '10', '20', '10', '10']

Once a Query has been created, we can perform additional operations on it using the methods described below.

Aggregate Methods

Aggregate methods operate on a collection of elements and produce a single result. The Query class provides several aggregate methods: sum(), avg(), min(), max(), and count(). For more information see the aggregate methods reference documentation.

Use the sum() method to sum the elements in column C:

>>> select('C').sum()
Query(<squint.Select object at 0x7ffa625b>, ['C']).sum()
---- preview ----
100

When an aggregate method is called on a dict or other mapping, the groups—the dictionary values—are operated on separately.

Use the sum() method to sum each group of elements:

>>> select({'A': 'C'}).sum()
Query(<squint.Select object at 0x7ffa625b>, {'A': ['C']}).sum()
---- preview ----
{'x': 50, 'y': 30, 'z': 20}

Type Conversion

The Query class contains two methods that perform automatic type conversion:

In the example above, column C contains str elements. These strings are automatically converted to float values. The other functional methods do not do this—use map() to convert values explicitly.

Functional Methods

Functional methods take a user-provided function and use it to perform a specified procedure. The Query class provides the following functional methods: map(), filter(), reduce(), apply(), etc. For more information see the functional methods reference documentation.

Use the map() method to apply a function to each element:

>>> def uppercase(value):
...     return value.upper()
...
>>> select('B').map(uppercase)
Query(<squint.Select object at 0x7ffa625b>, ['B']).map(uppercase)
---- preview ----
['FOO', 'FOO', 'FOO', 'BAR', 'BAR', 'BAR']

Use the filter() method to narrow the selection to items for which the function returns True:

>>> def not_bar(value):
...     return value != 'bar'
...
>>> select('B').filter(not_bar)
Query(<squint.Select object at 0x7ffa625b>, ['B']).filter(not_bar)
---- preview ----
['foo', 'foo', 'foo']

Element-Wise vs Group-Wise Methods

The map(), filter(), and reduce() methods perform element-wise procedures—they call their user-provided functions for each element and do something with the result. The apply() method, however, performs a group-wise procedure. Rather than calling its user-provided function for each element, it calls the function once per container of elements.

Use the apply() method to apply a function to an entire container of elements:

>>> def join_strings(container):
...     return '-'.join(container)
...
>>> select('B').apply(join_strings)
Query(<squint.Select object at 0x7ffa625b>, ['B']).apply(join_strings)
---- preview ----
'foo-foo-foo-bar-bar-bar'

Like the aggregate methods, when apply() is called on a dict or other mapping, the groups—the dictionary values—are operated on separately.

Use the apply() method to apply a function for each container of elements:

>>> select({'A': 'B'}).apply(join_strings)
Query(<squint.Select object at 0x7ffa625b>, {'A': ['B']}).apply(join_strings)
---- preview ----
{'x': 'foo-foo', 'y': 'foo-bar', 'z': 'bar-bar'}

Data Handling Methods

Data handling methods operate on a collection of elements by reshaping or otherwise reformatting the data. The Query class provides the following data handling methods: flatten(), unwrap(), and distinct(). For more information see the data handling methods reference documentation.

The flatten() method serializes a dict or other mapping into list of tuple rows. Let’s start by observing the structure of a selected dictionary {'B': 'C'}:

>>> select({'B': 'C'})
Query(<squint.Select object at 0x7ffa625b>, {'B': ['C']})
---- preview ----
{'foo': ['20', '30', '10'],
 'bar': ['20', '10', '10']}

Now, use the flatten() method to serialize this same selection ({'B': 'C'}) into a list of tuples:

>>> select({'B': 'C'}).flatten()
Query(<squint.Select object at 0x7ffa625b>, {'B': ['C']}).flatten()
---- preview ----
[('foo', '20'), ('foo', '30'), ('foo', '10'),
 ('bar', '20'), ('bar', '10'), ('bar', '10')]

The unwrap() method unwraps single-element containers and returns the element itself. Multi-element containers are untouched. Observe the structure of the following preview, {('A', 'B'): 'C'}:

>>> select({('A', 'B'): 'C'})
Query(<squint.Select object at 0x7ffa625b>, {('A', 'B'): ['C']})
---- preview ----
{('x', 'foo'): ['20', '30'],
 ('y', 'bar'): ['20'],
 ('y', 'foo'): ['10'],
 ('z', 'bar'): ['10', '10']}

Use the unwrap() method to unwrap ['20'] and ['10'] but leave the multi-element lists untouched:

>>> select({('A', 'B'): 'C'}).unwrap()
Query(<squint.Select object at 0x7ffa625b>, {('A', 'B'): ['C']}).unwrap()
---- preview ----
{('x', 'foo'): ['20', '30'],
 ('y', 'bar'): '20',
 ('y', 'foo'): '10',
 ('z', 'bar'): ['10', '10']}

Data Output Methods

Data output methods evaluate the query and return its results. The Query class provides the following data output methods: fetch(), execute() and to_csv(). For more information see the data output methods reference documentation.

Use the fetch() method to eagerly evaluate the query and return its results:

>>> select('A').fetch()
['x', 'x', 'y', 'y', 'z', 'z']

Use the execute() method to lazily evaluate the query by returning a Result object:

>>> select('A').execute()
<Result object (evaltype=list) at 0x7fa32d16>

Eager vs Lazy Evaluation

When a query is eagerly evaluated, its elements are all loaded into memory at the same time. But when a query is lazily evaluated, its individual elements are computed one-at-a-time. See the Using Results tutorial for more information about eager and lazy evaluation.

Use the to_csv() method to save the query results into a CSV file:

>>> select('A').to_csv('myresults.csv')

Method Chaining

You can build increasingly complex queries by chaining methods together as needed:

>>> def not_z(value):
...     return value != 'z'
...
>>> def uppercase(value):
...     return str(value).upper()
...
>>> select('A').filter(not_z).map(uppercase).fetch()
['X', 'X', 'Y', 'Y']

In the example above, the filter(), map(), and fetch() methods are chained together to perform multiple operations within a single statement and then output the data.

Method Order

The order of most Query methods can be mixed and matched as needed. But the data output methods—like fetch(), execute(), and to_csv()—can only appear at the end of a chain, not in the middle of one.