Cloud – Page 63 – Curated SQL

Low-Code Churn Prediction with Synapse Analytics

Published 2022-05-18 by Kevin Feasel

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.
Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.
This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Replacing Common Table Expressions in ADF Dataflows

Published 2022-05-16 by Kevin Feasel

Jeet Kainth needs an alternative:

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:
– re-write the query using subqueries instead of CTEs
– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

Azure Shared Disk with Zone-Redundant Storage

Published 2022-05-12 by Kevin Feasel

Dave Bermingham runs some tests:

What makes this interesting is that you can now build shared storage based failover cluster instances that span Availability Zones (AZ). With cluster nodes residing in different AZs, users can now qualify for the 99.99% availability SLA. Prior to support for ZRS, Azure Shared Disks only supported Locally Redundant Storage (LRS), limiting cluster deployments to a single AZ, leaving users susceptible to outages should an AZ go offline.
There are however a few limitations to be aware of when deploying an Azure Shared Disk with ZRS.

Dave also checks to see how their performance compares to locally-redundant storage.

Comments closed

Partial Update Operations in Cosmos DB

Published 2022-05-10 by Kevin Feasel

Hasan Savran partially deflates the partial update bubble:

Partial Update was one of the most wanted features by Cosmos DB customers. In a regular update operation, you need to send the whole JSON document to Cosmos DB. This can be silly if your data model is large and you want to update one field in it. With a regular update, your request object will be large because you need to send the whole data model. Regular Update operation needs more resources from the client/SDK and network bandwidth.
You might think that partial updates might cost fewer request units. Unfortunately, this is not the case. Because Cosmos DB still needs to open the JSON document, change the necessary properties and save the data. Cosmos DB uses almost the same amount of CPU and memory for this operation for a regular update or a partial update.

That it costs just about as much as a full write does reduce the value of partial updates. Still, there is some value in reducing bandwidth requirements or making changes where you don’t know the entire contents of the document up-front.

Comments closed

Exporting from Azure SQL Managed Instance to SQL Server

Published 2022-05-10 by Kevin Feasel

Eric Rouach gets straight to the point:

1) EXPORT a database from an Azure Managed Instance by creating a .bacpac file using SqlPackage.exe:

Click through for a sample which creates the bacpac file from your Managed Instance and then restores it to local SQL Server.

Comments closed

Azure Redis Cache Geo-Replication

Published 2022-05-09 by Kevin Feasel

Arun Sirpal shows how to set up geo-replication in Azure Redis Cache:

The concept of a geo-replicated partnership between a primary and secondary node is very similar to that of something you may have seen with Azure SQL DB, where the primary handles all R/W and then the changes are pushed to secondary ( async). This is no different with Redis.

Read on to see what limitations exist and how you can set up geo-replication.

Comments closed

Azure SQL DB ARM Template Conflicts with Azure AD Administration

Published 2022-05-06 by Kevin Feasel

Joao Antunes points out a potential timing issue around combining Azure Active Directory administration with Azure SQL Database ARM templates:

ARM templates are widely used when we need to repeatedly deploy solutions/infrastructures in the cloud. Leveraging the concept of infrastructure as code ARM templates are a powerful resource to ease our daily job, however we might face some challenges when using them.
When we are creating several resources within the same template – using Json or Bicep – it’s crucial to make sure that all resources are created in the right order, ensuring that all depending on resources are fully provisioned before you move to the next operation.
Error (internal server errors) and conflicts can occur during our ARM template deployment and it could be difficult to troubleshoot or understand the root cause of them.

Read on for one annoying error and its fix.

Comments closed

Data Products in Data Mesh

Published 2022-05-06 by Kevin Feasel

Paul Andrew takes us through a thought process:

In the context of an idealistic data mesh architecture, establishing a working definition of a data product seems to be very real problem for most. What constitutes a data product seems to be very subjective, circumstantial in terms requirements and interlaced with platform technical maturity. AKA, a ‘minefield’ to navigate in definitional terms.
To help get my thoughts in order (as always) here is my currently thinking and definition for a data mesh – data product.

Read on for Paul’s thoughts.

Comments closed

Finding Azure SQL DB Backup History

Published 2022-05-05 by Kevin Feasel

Taiob Ali takes us through a new DMV:

There is a new DMV currently in preview which returns information about backups of Azure SQL databases except for the Hyperscale tier. Microsoft official documentation is here.
If you run the example query as-is from the above documentation some of the columns do not make sense.

Taiob includes a better query which provides the type of information you’re used to in on-premises SQL Server.

Comments closed

Comparing Databricks to Synapse Spark Pools

Published 2022-05-04 by Kevin Feasel

Corrinna Peters makes comparisons:

There are different cases for using both depending on the specific needs and requirements, Synapse and Databricks are similar, but both have their own areas of specialities or rather areas where they are above the other.
Data Lake – they both allow you to query the data from the data lake, Synapse uses either the SQL on demand pool or Spark and Databricks uses the Databricks workspace once you have mounted the data lake. If you are predominately a SQL user and prefer the code and the BI developer feel then Synapse would be the correct choice whereas if you are a Data Scientist and prefer to code in Python or R then Databricks would feel more at home.

Read on for a nuanced take. My less nuanced take is, Databricks beats the pants off of Synapse Spark pools in terms of performance. Synapse has a much better overall ecosystem, expanding beyond Spark and into T-SQL (in two flavors) and log/event analytics with KQL. If you’re spending 100% of your time in Spark and don’t care about the rest, use Databricks; if Spark is a relatively small part of your warehousing work, use Synapse.

1 Comment

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud