Press "Enter" to skip to content

Data Virtualization with Azure SQL Managed Instance

Mladen Andzic announces data virtualization in Azure SQL Managed Instance:

Data virtualization capabilities, now in preview in Azure SQL Managed Instance, enable you to execute Transact-SQL (T-SQL) queries against data from files stored in Azure Data Lake Storage Gen2 or Azure Blob Storage and combine it with relational data stored locally in the managed instance using logical joins. This way you can transparently access external data while keeping it in its original format and location. There is no data duplication or need to run and maintain ETL processes, which means that you can extract and deliver insights faster. Currently supported file formats are Parquet, CSV, and JSON.

I’m going to start calling it PolyBase Duck Typing: it’s not actually PolyBase but the syntax is the same and the outcome is the same and the method to enable it is the same and “PolyBase” is a lot easier to say than “data virtualization.” So even though it’s not PolyBase, I’m going to call it PolyBase until there’s a meaningful split.