Data Structures


One of the main data structure used in the BlazingSQL engine is the BlazingTable. It is always used as a std::unique_ptr<BlazingTable>, since its designed to own the data. One simple way of thinking of a BlazingTable is as a wrapper for a cudf table that owns the data and that has names for all the columns, but its actually a little more complicated than that. A BlazingTable is actually a vector of BlazingColumn. BlazingColumn is an interface that has two implementations: - BlazingColumnOwner: A simple wrapper around a cudf column. - BlazingColumnView: A simple wrapper around a cudf column_view. Since a BlazingTable is a vector of BlazingColumn, then it can have a combination of columns that it owns or columns it does not.

In general, it is assumed that a BlazingTable always owns it columns and therefore it always contains BlazingColumnOwner. In the logic, the code always asumes that the data is owned by BlazingTable and when a BlazingTable object goes out of scope, the memory is freed. The exception to this is when the data it contains comes from a cudf DataFrame that is owned by the user. This happens when a user creates a table from a cudf DataFrame. In this case, the GPU data is owned by the user on the python layer. Only a view of that data is passed to the C++ layer. Rather than making a copy, we create a BlazingTable around that cudf DataFrame view and pretend that we own it. Additionally this ownership and/or lackthereof is defined at the column level and not the table level, so that you can perform a Project operation on that table that only modifies certain columns, then you can still keep some of the original data intact.


A BlazingTableView is a wrapper around a cudf table_view that also has column names. It is used for holding a non-owning view of a BlazingTable.