Press "Enter" to skip to content

Author: Kevin Feasel

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed

Kafka Control and Data Planes

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Principles of Synapse Security

Liliam Leme provides an overview of security options in Azure Synapse Analytics:

This blog post will provide an overview of the Synapse security environment focused on Dedicated SQL Pool, Serverless SQL Pool, and Spark.

Security has many layers and frequently it will determine how you build your process. I start this post by reviewing several important security considerations which you can later apply to your Synapse environment. 

This is a fairly lengthy post and it still only covers a moderate amount of what you’d want to do for Azure Synapse Analytics. This is the downside to having a complex interplay of several products: there’s a lot to secure and a lot to think about along the way.

Comments closed

Building an Event Calendar from T-SQL and CSS

Aaron Bertrand builds some HTML:

In my last tip, I showed how to use T-SQL to generate HTML for fancy calendar visuals overlaid with event data from another table. As an extension of that tip, let’s now look at simplifying parts of that query by caching the date information in a calendar table to streamline the outer queries and avoid complications caused by different DATEFIRST settings.

This is the follow-on from a prior post (linked in the lede here), so it would make sense to read that one first if you haven’t already.

Comments closed

ADX Dashboards Now Generally Available

Michal Bar provides an overview of Azure Data Explorer functionality now generally available :

Each ADX dashboard is a collection of tiles, optionally organized in pages, where each tile has an underlying query and a visual representation. Using the web UI, you can natively export Kusto Query Language (KQL) queries to a dashboard as visuals and later modify their underlying queries and visual formatting as needed. In addition to ease of data exploration, this fully integrated Azure Data Explorer dashboard experience provides improved query and visualization performance.

Read on to learn more.

Comments closed

Object Ownership in Postgres

Ryan Booz divvies up ownership rights:

In the first security article, PostgreSQL Basics: Roles and Privileges, I discussed how roles (users and groups) are created and managed in PostgreSQL Depending on your background with permissions, particularly in other database products, some of those nuances in how permissions work may have been surprising.

Understanding how roles and privileges work in Postgres is key to understanding the next, and often more confusing part of security, object ownership. Although there are many privileges that can be assigned in Postgres, object ownership comes with a specific level of privilege that cannot be transferred to other roles. Understanding the implications of this is essential to the management of your database schema, and access to the objects it contains, over time.

Click through to learn more about how Postgres handles object ownership and defines who gets access to which objects.

Comments closed

Matrix Multiplication in R with DuckDB and SQLite

Karsten Weinert compares two databases:

On my laptop with 16 GB RAM, I would like to perform a matrix-vector multiplication with a sparse matrix of around 10 million columns and 2500 rows. The matrix has approximately only 2% non-zero entries, but this are still 500 million numbers and the column/row information, too large to work comfortably in-memory.

A while ago, I tried using sqlite for this task. It kind of worked, but was too slow to be useful. This weekend, I revisited the problem and tried using duckdb.

Read on for the results. I’ve heard enough positives about DuckDB over the past few weeks that it makes me want to try it out. H/T R-Bloggers.

Comments closed

Halloween Protection and Non-Clustered Indexes

Jared Poche digs further into Halloween protection:

I find myself talking about the Halloween Problem a lot and wanted to fill in some more details on the subject. In short, the Halloween Problem is a case where an INSERT\UPDATE\DELETE\MERGE operates on a row more than once, or tries to and fails. In the first recorded case, an UPDATE changed multiple rows in the table more than once.

So let’s take a look at an example using a publicly available database, WideWorldImporters.

Read on for a case of Jared starting from the known and moving into the unknown.

Comments closed

Checking if a TCP Port Is in Use

Tom Collins knocks on doors:

Question: SQL Server won’t start , so checked Event Viewer and getting the following message. 

Server TCP provider failed to listen on [ ‘any’ <ipv4> 50010]. Tcp port is already in use.

How can I check if the port is already in use and which other process or service  has locked the port and therefore not allowing SQL Server to start on designated port ?

Read on to see how you can use Powershell to find the answer on Windows.

Comments closed

The Let Operator in KQL

Robert Cain continues a series on KQL:

Let me tell you about let, my favorite operator in the Kusto Query Language. Why my favorite?

It is extremely flexible. It lets you create constants, variables, datasets, and even reusable functions. Let me tell you, it’s very powerful.

My big problem with let, specifically with variable creation, is that the variables do not persist between batches. You can use variables between statements but only if you execute all relevant statements in one batch. This makes it harder for exploratory query building.

Comments closed