James Serra has a wrapup of Microsoft Connect(); announcements around the data platform space:
Microsoft Connect(); is a developer event from Nov 15-17, where plenty of announcements are made. Here is a summary of the data platform related announcements:
-
Azure Databricks: In preview, this is a fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure. It delivers one-click set up, streamlined workflows, and an interactive workspace all integrated with Azure SQL Data Warehouse, Azure Storage, Azure Cosmos DB, Azure Active Directory, and Power BI. More info
-
Azure Cosmos DB with Apache Cassandra API: In preview, this enables Cassandra developers to simply use the Cassandra API in Azure Cosmos DB and enjoy the benefits of Azure Cosmos DB with the familiarity of the Cassandra SDKs and tools, with no code changes to their application. More info. See all Cosmos DB announcements
-
Microsoft joins the MariaDB Foundation: Microsoft is a platinum sponsor – MariaDB is a community of the MySQL relational database management system and Microsoft will be actively contributing to MariaDB and the MariaDB community. More info
Click through for more. And if you want more info on Azure Databricks, Matei Zaharia and Peter Carlin have more information:
So how is Azure Databricks put together? At a high level, the service launches and manages worker nodes in each Azure customer’s subscription, letting customers leverage existing management tools within their account.
Specifically, when a customer launches a cluster via Databricks, a “Databricks appliance” is deployed as an Azure resource in the customer’s subscription. The customer specifies the types of VMs to use and how many, but Databricks manages all other aspects. In addition to this appliance, a managed resource group is deployed into the customer’s subscription that we populate with a VNet, a security group, and a storage account. These are concepts Azure users are familiar with. Once these services are ready, users can manage the Databricks cluster through the Azure Databricks UI or through features such as autoscaling. All metadata (such as scheduled jobs) is stored in an Azure Database with geo-replication for fault tolerance.
I’ve been a huge fan of the Databricks Community Edition. We’ll see if there will be a Community Edition version for Azure as well.