Press "Enter" to skip to content

Curated SQL Posts

ML Updates in Azure Synapse Analytics

Aria Jelinek and Nellie Gustafsson have some announcements for us:

Announced last week at Ignite 2021, data teams now have a handful of new opportunities to drive value with machine learning built directly into their Apache Spark pools in Azure Synapse Analytics.

With the general availability of our machine learning library for Apache Spark on Azure Synapse, data teams now have expanded access to both code-first and code-free ML tools for forecasting, model training, and pre-built AI. This library provides both familiar open-source tools such as LightGBM as well as proprietary solutions to provide a comprehensive, streamlined approach to ML workloads. Updates include PREDICT, a new keyword that supports scoring AzureML and MLFlow models directly in Azure Synapse, and integration with Azure Cognitive Services, now generally available.

Click through for all of the announcements.

1 Comment

The Importance of Data Governance

Rob Farley riffs on another T-SQL Tuesday topic:

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.

Plenty of businesses seem to recognise these scenarios all too well, and can point them out when they come across them. You hear phrases like “Oh, we know that’s not right, it should be XYZ instead”. And they become reasons why they don’t really trust their data. It’s a data quality issue, and every time someone comes across a data quality issue, they trust the data a little less.

Click through for Rob’s thoughts.

Comments closed

Azure Network Gateway Logging

Denny Cherry walks us through gateway logging in Azure:

If you’ve ever set up an Azure Network Gateway for Site to Site or Person to Site VPNing you’ve probably wanted to be able to see logging from the gateway. In the Azure portal, you can see a Logs option, but all it does is tell you to set up log analytics and the link that it gives you is … less than helpful.

Denny, however, has helpful instructions, so check it out.

Comments closed

Show All Merge Replication Articles

Steve Stedman prods the demons of merge replication:

At Stedman Solutions, we do a lot of work with SQL Server replication, mostly transactional and merge replication.

The other day I needed a query to show all the merge replication publication on a SQL Server, not just a single database, but to see it for all databases on the SQL Server.

Here is the query that I came up with.

Merge replication can be really great if you know what you’re doing. But it can also turn into a train wreck easily, and it’s really tough to get a good understanding of why something’s going wrong or how long it will take to be fixed (if at all).

Comments closed

Thoughts on the New STRING_SPLIT

Ronen Ariely has mixed feelings on updates to the STRING_SPLIT function:

The main issue with this function, is that it returns a SET of rows with no specific order.

As you must know by now, a TABLE is a SET of rows (Rowstore table which is the more common in SQL Server) or columns (Columnstore table). The rows in the table are not stored in specific order (even if using clustered index, the rows can physically be stored in different locations on the disk, not necessarily maintained continuously one after the other. In addition, the server might read the rows in parallel and not necessarily in the order of the index. As a result, The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.

And this is the main issue with the STRING_SPLIT… until today

Read on to see how this update makes STRING_SPLIT() much better, and also how it could be even better still.

Comments closed

Show Data as Table and Security

Chris Webb explains that hiding a column isn’t the same thing as preventing access to a column:

In the last few months the following issue has been escalated up to the Power BI CAT team several times: customers have deployed reports into production and then found that users are able to see data they should not be allowed to see by using the “Show data point as a table” feature. The question is: is this a security hole? It isn’t, and in this blog post I’ll explain why and how you should think about security as something that happens on the dataset and not in the report.

My official response is “Hmm…” I don’t disagree with Chris, but I do understand how people might not know this and get blindsided because they think they’ve prevented someone from seeing a sensitive column. I think part of my reaction is that this functionality isn’t blaringly obvious to report developers, and so there’s a little bit of “How could you know this could happen?”

Comments closed

Resource Governor and Scheduler Domination

Forrest McDaniel shares an example of Resource Governor blocking query activity:

Ok, I get it, scheduling queries can be complicated. See this and this and maybe this and this too but only if you have insomnia. I still thought I kinda understood it. Then I started seeing hundreds of query timeouts on a quiet server, where the queries were waiting on…what? CPU? The server’s at 20%, why would a query time out after a minute, while only getting 5s of CPU, and not being blocked?

It was Resource Governor messing up one of my servers recently in a way I never expected.

Click through for the story. I’m not sure I’ve experienced this when running Resource Governor, but Forrest has an easy demo which replicates the problem.

Comments closed

Reducing Image Sizes with Docker-Slim

Evan Seabrook puts Docker images on a diet:

If you’ve ever worked with Docker, there’s likely been at least one time when it started taking up significant storage space on your computer. For example, some of your images took a long time to download in a CI/CD pipeline. Some common approaches to this problem are to:

– Swap out the base image for something lighter

– Reduce the number of RUN statements in your Dockerfile

– Remove cached package manager artifacts as part of your Dockerfile

These steps, while helpful, can take up a significant amount of time and effort. Thankfully, there are open source tools that can automatically minify an existing Docker image. Enter docker-slim.

Looks like it can slim things down considerably. I haven’t tried this, but might give it a go and see how it works.

Comments closed

Azure Synapse Analytics Database Templates

Santosh Balasubramanian shows off database templates in Azure Synapse Analytics:

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.

One of the challenges that users in key industry areas face is how to describe and shape the mass of data that they are gathering. Most of this data is currently stored in data lakes or in application-specific data silos. The challenge is to bring all this data together in a standardized format enabling it to be more easily analyzed and understood and for ML and AI to be applied to it.

Azure Synapse solves this problem by introducing industry-specific templates for your data, providing a standardized way to store and shape data. These templates provide schemas for predefined business areas, enabling data to be loaded into a database in a structured way.

Read on to see what they can do, and try them out in a Synapse workspace.

Comments closed

Lessons Learned from the Serverless SQL Pool

Teo Lachev has some thoughts for us:

I’ve architected and currently implementing a solution that uses Synapse (my last newsletter has the details, plus the architecture diagram). Synapse Serverless is the Microsoft answer to Amazon Athena but instead of using open-source tools like Presto, it’s built on SQL Server. In this project we extract many tables from 1,500 on-prem SQL Server databases and stage them in ADLS.

Read on for Teo’s notes on the topic.

Comments closed