Cloud – Page 80 – Curated SQL

This principal is very broad, so I want to break down the theory vs practice as before. The idea of self-service is always a goal in any data platform and the normal thing for analytics is to focus on this within the context of our data consumption. Whereby a semantic layer technology can be used in a friendly business orientated, drag-drop type environment to create dashboards or whatever.
However, my interpretation of ‘self-serve’ for a data mesh architecture goes further than just the dashboard creation use case. This should not just apply at the data consumption layer, but all layers within the solution and for clarify, not just related to the data itself. Hence the term in this principal ‘data infrastructure as a platform’. This then unlocks the deeper implication of this serving for a data product, all abstracts of the platform can be consumed in a self-service manner from a series of predefined assets. Let’s think about this serving more like an internal marketplace or catalogue of assets for delivering everything the data product needs to enable a new node within the wider data mesh.

Read on for some deep thoughts on the topic.

Comments closed

A Free Power BI Sandbox

Published 2022-01-31 by Kevin Feasel

Reza Rad has the right price in mind:

A question I often get from many students is: “How can I practice Power BI service features if I do not have a Power BI Account?”. Not having a Power BI account can happen because of many scenarios; your company might close this option so that the process be only channeled through a specific process within the company. Or you may not have the permission to do so. Not having an account makes it difficult to practice Power BI Service options such as workspace, datasets, dashboards, dataflows, apps, and many other features. On the other hand, even if you have the Power BI Service account, in most of the organizations, you are not the service administrator, so you cannot practice tenant-settings configurations in the service.
Fortunately, there is a way to create your own Power BI sandbox; which means an environment just for yourself, with 25 accounts. You will be the administrator of your environment. The environment will be up for at least 90 days, and you can practice whatever you want for the Power BI service there. The best of all, it is FREE. You don’t have to pay a cent for it. Credit card detail is not needed. What better you could wish for?

Read on to see how.

Comments closed

Multivariate Time Series Anomaly Detection in Azure

Published 2022-01-28 by Kevin Feasel

Louise Han announces an update to the anomaly detection service:

We are excited to announce that we are adding more powerful capabilities in Microsoft Azure Multivariate Anomaly Detector (MVAD) today. In the latest version(v1.1-preview.1) of this API, we implemented a new , in a synchronous manner, which means you could get the anomaly detection results immediately once you call this API. This synchronous inference API is a substantial change compared with previous inference process and will be more intuitive and easier-to-use.
Also, we added a new field named ‘interpretation‘ to give more explanations on an anomaly, like which variables have huge correlation changes and cause the anomaly. These updates will support you to better leverage MVAD and get more useful information to analyze and take actions.

Click through for some more details.

Comments closed

Go/No-Go Indicators for Oracle Migrations to Azure

Published 2022-01-27 by Kevin Feasel

Kellyn Pot’vin-Gorman lays out some guidance on Oracle to Azure migrations:

When migrating an Oracle database to another platform, there are the common indicators and discussion topics around PL/SQL conversions, data types, application rewrites, etc., as being roadblocks to refactoring, but being successful also has to do with the SIZE of the workload coming from Oracle. I find this is often dismissed, even though this is one of the quickest ways to identify if an ENTIRE Oracle database, (not even by schema or a subset of the Oracle database) can run on a Platform as a Service, (PaaS) solution.

Click through for more information on PaaS limits for Oracle databases in Azure.

Comments closed

Using the Azure Form Recognizer

Published 2022-01-26 by Kevin Feasel

Cem Ayberkin shows off the Azure Form Recognizer:

Shopping malls are facing strong competition and effective loyalty programs boost customer retention. The primary goal of the loyalty scheme is to promote loyalty at the mall, increase footfall whilst understanding shopping habits. With large number of stores and various receipt formats in a mall, the process of manual checking and verification of the data submitted in place did enable rewards to be issued, but proved slow, expensive, inconsistent, and non-scalable. It did not include the valuable line item/product information the mall needed to understand the shopping habits. Therefore, one of the largest shopping malls used Azure Form Recognizer automating receipt scanning and data extraction and feeding the data as rewards points into the customer’s loyalty program, which greatly improved customer shopping experience.

I was pleasantly surprised with how the Form Recognizer works. It’s not perfect but it is useful.

Comments closed

Automating Pipeline Migration to Synapse via Azure DevOps

Published 2022-01-26 by Kevin Feasel

Kevin Chant deploys some Synapse pipelines:

In this post I want to cover how you can automate a pipeline migration to a Synapse workspace using Azure DevOps. As a follow up to a previous post I did about one way to copy an Azure Data Factory pipeline to Synapse Studio.
Because even though the post is good it deserves a follow up showing an automated way of doing it. I wanted to show that it can be done more gracefully.

And we all want to be graceful, right?

Comments closed

Databricks Delta Sharing for Azure

Published 2022-01-25 by Kevin Feasel

Will Girten, et al, announce Delta Sharing on Azure:

Included in this release is a new and improved API for listing all the tables under all schemas in a share. The new API supports pagination similar to other APIs.
For example, to list all the tables in the Delta share my_share, you can simply send a GET request to the /shares/{share_name}/all-tables endpoint on the sharing server.

Prior to that, you might want to read up on Delta Sharing.

Comments closed

Using Synapse Link for Cosmos DB

Published 2022-01-25 by Kevin Feasel

I have a post combining Synapse Link for Cosmos DB and the Spark to Synapse SQL Connector:

In this post, we saw how to enable Cosmos DB’s Analytical store, access data using Synapse Link for Cosmos DB, and use the Spark to Synapse SQL Connector to move that data into a dedicated SQL pool. We saw how to do this in a workspace using a managed virtual network with data exfiltration protection enabled, meaning this is the largest number of steps necessary.

Click through for product descriptions and step-by-step instructions.

Comments closed

Scheduling Azure ML Compute Instance Start-Up and Shut-Down

Published 2022-01-24 by Kevin Feasel

I have a post correcting a statement I made before:

The single biggest problem I have with compute instances is that there is no auto-stop functionality to them. This is really frustrating because you’re paying for that virtual machine like you would any other, so if you forget to turn it off when you go home for the weekend, it’ll cost you. I wish there were a built-in option to shut off a compute instance after a certain amount of inactivity. Instead, you’ll need to start and stop them manually.
It turns out that you can and so I wanted to write a post to correct the record.

Click through to see how you can do this. You can bet that I’ve got it enabled now.

Comments closed

Azure Synapse Analytics Integration Points

Published 2022-01-21 by Kevin Feasel

Warner Chaves takes us through several integration points with Azure Synapse Analytics:

Azure Stream Analytics allows for in-flight querying of streaming data from Blog storage, Data Lake Storage, IoT Hub or Event Hubs. The querying is done through an easily adoptable SQL language and it really speeds up the development of a streaming solution.
The nice thing here is that Stream Analytics allows the use of a Synapse SQL Pool table as the target for the results of the streaming query. So, this is another way to do near real-time analytics by passing data from a streaming source through a Stream Analytics job and into a Synapse table. You could do this to pre-aggregate data on the fly, score data in real-time, perform real-time calculations over specific time or event windows, etc.

Click through for several examples of this.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Cloud

Data Mesh in Azure: Self-Service Infrastructure

A Free Power BI Sandbox

Multivariate Time Series Anomaly Detection in Azure

Go/No-Go Indicators for Oracle Migrations to Azure

Using the Azure Form Recognizer

Automating Pipeline Migration to Synapse via Azure DevOps

Databricks Delta Sharing for Azure

Using Synapse Link for Cosmos DB

Scheduling Azure ML Compute Instance Start-Up and Shut-Down

Azure Synapse Analytics Integration Points