Building Queries¶
The following examples demonstrate squint’s Query
class.
This document builds on the Making Selections tutorial.
Get Started¶
We will get started the same way we did in the first tutorial. Begin
by starting the Python interactive prompt in the same directory as the
example.csv
file. Once you are at
the >>>
prompt, import squint and load the data:
>>> import squint
>>> select = squint.Select('example.csv')
Creating a Query Object¶
In the Making Selections tutorial, we created several Query
objects—each call to a Select
object returns a Query.
By selecting a list of elements from column C, we get a
Query
object in return:
>>> select('C')
Query(<squint.Select object at 0x7ffa625b>, ['C'])
---- preview ----
['20', '30', '10', '20', '10', '10']
We can also create Queries directly using the following syntax (although it’s rarely necessary to do so):
>>> squint.Query(select, 'C')
Query(<squint.Select object at 0x7ffa625b>, ['C'])
---- preview ----
['20', '30', '10', '20', '10', '10']
Once a Query has been created, we can perform additional operations on it using the methods described below.
Aggregate Methods¶
Aggregate methods operate on a collection of elements and produce
a single result. The Query class provides several aggregate methods:
sum()
, avg()
,
min()
, max()
, and
count()
. For more information see the
aggregate methods reference documentation.
Use the sum()
method to sum the elements in
column C:
>>> select('C').sum()
Query(<squint.Select object at 0x7ffa625b>, ['C']).sum()
---- preview ----
100
When an aggregate method is called on a dict
or other
mapping, the groups—the dictionary values—are operated on
separately.
Use the sum()
method to sum each group of
elements:
>>> select({'A': 'C'}).sum()
Query(<squint.Select object at 0x7ffa625b>, {'A': ['C']}).sum()
---- preview ----
{'x': 50, 'y': 30, 'z': 20}
Functional Methods¶
Functional methods take a user-provided function and use it
to perform a specified procedure. The Query class provides
the following functional methods: map()
,
filter()
, reduce()
,
apply()
, etc. For more information see the
functional methods reference
documentation.
Use the map()
method to apply a function
to each element:
>>> def uppercase(value):
... return value.upper()
...
>>> select('B').map(uppercase)
Query(<squint.Select object at 0x7ffa625b>, ['B']).map(uppercase)
---- preview ----
['FOO', 'FOO', 'FOO', 'BAR', 'BAR', 'BAR']
Use the filter()
method to narrow the selection
to items for which the function returns True:
>>> def not_bar(value):
... return value != 'bar'
...
>>> select('B').filter(not_bar)
Query(<squint.Select object at 0x7ffa625b>, ['B']).filter(not_bar)
---- preview ----
['foo', 'foo', 'foo']
Element-Wise vs Group-Wise Methods
The map()
, filter()
, and
reduce()
methods perform element-wise
procedures—they call their user-provided functions for each
element and do something with the result. The apply()
method, however, performs a group-wise procedure. Rather
than calling its user-provided function for each element, it calls the
function once per container of elements.
Use the apply()
method to apply a function
to an entire container of elements:
>>> def join_strings(container):
... return '-'.join(container)
...
>>> select('B').apply(join_strings)
Query(<squint.Select object at 0x7ffa625b>, ['B']).apply(join_strings)
---- preview ----
'foo-foo-foo-bar-bar-bar'
Like the aggregate methods, when apply()
is
called on a dict
or other mapping, the groups—the
dictionary values—are operated on separately.
Use the apply()
method to apply a function
for each container of elements:
>>> select({'A': 'B'}).apply(join_strings)
Query(<squint.Select object at 0x7ffa625b>, {'A': ['B']}).apply(join_strings)
---- preview ----
{'x': 'foo-foo', 'y': 'foo-bar', 'z': 'bar-bar'}
Data Handling Methods¶
Data handling methods operate on a collection of elements by reshaping
or otherwise reformatting the data. The Query class provides the
following data handling methods: flatten()
,
unwrap()
, and distinct()
.
For more information see the data handling methods reference documentation.
The flatten()
method serializes a dict
or other mapping into list of tuple rows. Let’s start by observing the
structure of a selected dictionary {'B': 'C'}
:
>>> select({'B': 'C'})
Query(<squint.Select object at 0x7ffa625b>, {'B': ['C']})
---- preview ----
{'foo': ['20', '30', '10'],
'bar': ['20', '10', '10']}
Now, use the flatten()
method to serialize this
same selection ({'B': 'C'}
) into a list of tuples:
>>> select({'B': 'C'}).flatten()
Query(<squint.Select object at 0x7ffa625b>, {'B': ['C']}).flatten()
---- preview ----
[('foo', '20'), ('foo', '30'), ('foo', '10'),
('bar', '20'), ('bar', '10'), ('bar', '10')]
The unwrap()
method unwraps single-element
containers and returns the element itself. Multi-element containers
are untouched. Observe the structure of the
following preview, {('A', 'B'): 'C'}
:
>>> select({('A', 'B'): 'C'})
Query(<squint.Select object at 0x7ffa625b>, {('A', 'B'): ['C']})
---- preview ----
{('x', 'foo'): ['20', '30'],
('y', 'bar'): ['20'],
('y', 'foo'): ['10'],
('z', 'bar'): ['10', '10']}
Use the unwrap()
method to unwrap ['20']
and ['10']
but leave the multi-element lists untouched:
>>> select({('A', 'B'): 'C'}).unwrap()
Query(<squint.Select object at 0x7ffa625b>, {('A', 'B'): ['C']}).unwrap()
---- preview ----
{('x', 'foo'): ['20', '30'],
('y', 'bar'): '20',
('y', 'foo'): '10',
('z', 'bar'): ['10', '10']}
Data Output Methods¶
Data output methods evaluate the query and return its results.
The Query class provides the following data output methods:
fetch()
, execute()
and to_csv()
. For more information see
the data output methods reference
documentation.
Use the fetch()
method to eagerly evaluate
the query and return its results:
>>> select('A').fetch()
['x', 'x', 'y', 'y', 'z', 'z']
Use the execute()
method to lazily evaluate the
query by returning a Result
object:
>>> select('A').execute()
<Result object (evaltype=list) at 0x7fa32d16>
Eager vs Lazy Evaluation
When a query is eagerly evaluated, its elements are all loaded into memory at the same time. But when a query is lazily evaluated, its individual elements are computed one-at-a-time. See the Using Results tutorial for more information about eager and lazy evaluation.
Use the to_csv()
method to save the
query results into a CSV file:
>>> select('A').to_csv('myresults.csv')
Method Chaining¶
You can build increasingly complex queries by chaining methods together as needed:
>>> def not_z(value):
... return value != 'z'
...
>>> def uppercase(value):
... return str(value).upper()
...
>>> select('A').filter(not_z).map(uppercase).fetch()
['X', 'X', 'Y', 'Y']
In the example above, the filter()
,
map()
, and fetch()
methods are chained together to perform multiple operations
within a single statement and then output the data.