BlazingSQL is an open source, highly scalable, distributed analytical SQL engine that uses Nvidia GPUs to deliver very high performance. BlazingSQL is part of the RAPIDS stack and therefore integrates well with all the RAPIDS libraries. Here in this documentation you can learn about how the BlazingSQL engine works and its tech stack
Python User Interface¶
The main user interface for BlazingSQL is through its python library
blazingsql. Via python, user’s can create tables, register
filesystems, configure the engine and run queries. BlazingSQL returns query results as cudf DataFrames or dask-cudf DataFrames when running
in a distributed mode. You can lean more about the Python side of BlazingSQL here.
When a user runs a SQL query, that SQL query gets converted into relational algebra by leveraging Apache Calcite. This relational algebra gets sent to the BlazingSQL Core engine for execution. You can lean more about Apache Calcite and the relational algebra produced here.
In the BlazingSQL Core engine, the relational algebra produced by Apache Calcite becomes a physical relational algebra plan, which in turn becomes a directed acyclic graph (DAG), where each node is a kernel and the edges which connect the nodes are caches. Each kernel takes input data and generates a task to process the data, which is executed by the task executor. You can lean more about the BlazingSQL Core engine here.
The BlazingSQL engine has several other very important components that lend to its extensibility, flexibility and performance:
Memory management features
Communication library that allow for very performant node to node communication using either TCP or UCX.
Interops: BlazingSQL’s own row based operations engine.
I/O module to support for various file formats (text delimited, Apache Orc, Apache Parquet, JSON) and various filesystems (local, HDFS, AWS S3, GCS).
Data structures to process and implement it all.