I’ve been working on a project for the last few months with a client who has chosen to implement Dremio in Azure. Dremio is a data lake engine that creates a semantic layer and supports interactive queries.
It uses Apache Arrow, Gandiva, and Parquet files under the hood. It runs on either Linux VMs or Kubernetes containers. Like most big data systems, there is at least one coordinator node and one or more executor nodes. These nodes communicate and are managed using Apache Zookeeper. Client applications connect to Dremio via ODBC, JDBC, REST APIs, or Arrow Flight. Dremio can read from storage accounts, external databases, and a few other sources.
Read on for good and bad aspects of the product.