Press "Enter" to skip to content

Data Virtualization in Azure SQL Managed Instance

Mladen Andzic has an announcement:

We are excited to announce the general availability (GA) of data virtualization capabilities in Azure SQL Managed Instance, with improved query performance and managed identity as a new supported option for authenticating to storage accounts.  

The data virtualization enables you to execute Transact-SQL (T-SQL) queries on files storing data in common data formats in Azure Data Lake Storage Gen2 or Azure Blob Storage and combine it with relational data stored locally in the managed instance using logical joins. This way you can transparently access external data while keeping it in its original format and location. There is no data duplication or need to run and maintain ETL processes, which means that you can extract and deliver insights faster. The supported file formats are Parquet, CSV, and JSON.

This is similar to PolyBase in SQL Server 2019 but is a different underlying technology. In SQL Managed Instance, it looks like we only get API-based data virtualization, not the ODBC-based PolyBase we saw in SQL Server 2019.