Querying data

Query statements like SELECT scan one or more data tables registered with BlazingContext and return results of the engine run. In this section we describe the most fundamental SQL functionality: selecting rows.

Selecting data

SELECT function

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Returns rows and columns from a table.

Examples

SELECT *

Selects all the data from a table, including all the columns and all the rows.

SELECT *
FROM <table_name>
SELECT expr

Allows selecting one or a subset of columns from a table. It also supports aliasing columns with different names using the AS qualifier.

SELECT <col_1>
    , <col_2> AS <bar>
FROM <table_name>

Tables can also be aliased; BlazingSQL supports selecting all

SELECT t.*
FROM <table_name> AS t

or a subset of columns from an aliased table.

SELECT T.<col_1>
    , T.<col_2> AS <bar>
FROM <table_name> AS T

The SELECT expr further allows for including unary and binary functions such as mathematical, string, timestamp, windowing, and conditional.

LIMIT

Returns only a specified number of rows, normally from the top of the table, or from the top of the table in the first partition in case of a distributed table.

SELECT *
FROM <table_name>
LIMIT <number_of_rows>
SELECT ALL

Returns all the rows from a table, including duplicated ones. Thus, SELECT and SELECT ALL are equivalents.

SELECT DISTINCT function

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Returns unique rows from a table.

Examples

SELECT DISTINCT *

Selects all the unique rows from the table. Uniqueness is assessed across all the columns.

SELECT DISTINCT *
FROM <table_name>
SELECT DISTINCT expr

Select all the non-duplicate rows as defined by a single column or multiple columns.

SELECT DISTINCT <col_1>
    , <col_2>
FROM <table_name>

The SELECT DISTINCT expr suports unary and binary functions such as mathematical, string, timestamp, windowing, and conditional.

WHERE function

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Filter returned rows from a table. The following operators are supported in WHERE statements: comparison , logical, and IN operators and IS operators.

Examples

Comparison operators

These include operators like less than or equal to. Refer to comparison operators for a full list of supported syntax.

SELECT *
FROM <table_name>
WHERE <col_1> = <number>
SELECT *
FROM <table_name>
WHERE <col_1> <= <number>

Selecting rows can also be done by comparing values in columns with comparable data types: numeric with numeric, timestamp with timestamp , string with string (only = is supported).

SELECT *
FROM <table_name>
WHERE <col_1> >= <col_2>
Logical operators

When using the WHERE statement rows are selected where the predicate for the row evaluates to TRUE: in first the example above only rows where col_1 are equal to some number will be selected.

These statements can be further chained to create more complex logical statements.

SELECT *
FROM <table_name>
WHERE <col_1> => <number_1>
    AND <col_2> <= <number_2>
SELECT *
FROM <table_name>
WHERE <col_1> BETWEEN <number_1> AND <number_2>
    OR <col_2> <= <number_3>

Refer to logical operators for a full list of supported syntax.

IN operators

Filtering data to some subset of allowed values can be achieved with IN operators.

SELECT *
FROM <table_name>
WHERE <col_1_string> IN (<string_1>, <string_2>)

The above example is an equivalent to:

SELECT *
FROM <table_name>
WHERE <col_1_string> = <string_1>
    OR <col_1_string> = <string_2>

To select all the rows but those that are equal to string_1 or string_2, the IN statement can be negated:

SELECT *
FROM <table_name>
WHERE <col_1_string> NOT IN (<string_1>, <string_2>)

If the list of strings is long a subquery can be included within the parentheses:

SELECT *
FROM <table_1>
WHERE <col_1_string> IN (
    SELECT <col_2_string>
    FROM <table_2>
    WHERE <col_1> >= <number>
)

Refer to IN operators for a full list of supported syntax.

IS operators

Filtering data using BOOLEAN columns can be achieved by simply passing the column itself (or its negation) as a predicate:

SELECT *
FROM <table_name>
WHERE <col_1>
SELECT *
FROM <table_name>
WHERE NOT <col_1>

The first example above code is equivalent to

SELECT *
FROM <table_name>
WHERE <col_1> IS TRUE

or

SELECT *
FROM <table_name>
WHERE <col_1> IS NOT FALSE

Refer to IS operators for a full list of supported syntax.

ORDER BY function

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Sort the returned rows in a specified order.

Examples

ORDER BY

Ordering by columns using ORDER BY <col>

SELECT *
FROM <foo>
ORDER BY <col_1>
SELECT *
FROM <foo>
ORDER BY <col_1>, <col_2>
Monotonicity qualifiers

Ordering ascending or descending using ASC and DESC qualifiers.

SELECT *
FROM <foo>
ORDER BY <col_1> ASC
SELECT *
FROM <foo>
ORDER BY <col_1> DESC

These qualifiers can also be mixed

SELECT *
FROM <foo>
ORDER BY <col_1> ASC, <col_2> DESC
NULL values sorting

If the columns contain NULL values, by default, the rows with NULL values will be placed last. This behavior can be changed explicitly using NULLS FIRST or NULLS LAST qualifiers:

SELECT *
FROM <foo>
ORDER BY <col_1> NULLS LAST -- default
SELECT *
FROM <foo>
ORDER BY <col_1> NULLS FIRST

These qualifiers are also supported for multi-column sorts with monotonicity qualifiers:

SELECT *
FROM <foo>
ORDER BY <col_1> ASC NULLS FIRST
    , <col_2> DESC NULLS LAST

Nested queries

BlazingSQL supports nested queries that might first apply predicates in the subquery and and later join it with another table or subquery.

SELECT A.<col_1>
    , B.<col_3>
FROM (
    SELECT <col_1>
        , <col_2>
    FROM <table_1>
    WHERE <col_2> < 0
) AS A
INNER JOIN <table_2> AS B
    ON A.<col_2> = B.<col_2>

WITH function

Complex queries with multiple nested subqueries can quickly become unreadable. BlazingSQL supports organizing complex queries with the WITH qualifier.

WITH <subquery_1> AS (
    SELECT <col_1>
        , <col_2>
    FROM <table_1>
    WHERE <col_2> < 0
), <subquery_2> AS (
    SELECT B.<col_1>
        , A.<col_3>
    FROM <table_2> AS A
    INNER JOIN <subquery_1> AS B
        ON A.<col_2> = B.<col_2>
)
SELECT *
FROM <subquery_2>