Curated SQL – Page 576 – A Fine Slice Of SQL Server

The SQL Server transaction log backup chain aka log chain is the series of sequential transaction log backups related to a database. The log backups are related to each other and are represented through LSN . Breaking the transaction log chain will limit the restore point of the backups.

Click through for four such reasons as well as a scenario explaining how it could happen.

Comments closed

Preventing Data Exfiltration form Managed Instances

Published 2022-08-05 by Kevin Feasel

Niko Neugebauer wants to hang on to that data:

Data exfiltration is a technique that is also sometimes described as data theft or data extrusion, that describes the unauthorized extraction of data from the original source. This unauthorized extraction can be executed either manually or automatically by the malicious attacker.
As part of your Network Infrastructure, you might have tightened your security to make sure you have all the bells and whistles to lock down your Azure SQL Managed Instance to be accessed only by your application and not exposed to the Internet or any other traffic. However, this doesn’t stop a malicious admin from taking a backup or creating a linked server to another resource outside your enterprise subscription for extracting the data. This action would be data exfiltration. In a typical on-premises infrastructure, you can lock down network access completely to make sure that the data never leaves your network. However, in a cloud setup, there is a possibility that someone with elevated privileges can export data or perform some other malicious activity targeting their own resources outside your organization, compromising your enterprise data. Hence, it is very important to understand the different data exfiltration scenarios and make sure that you are taking the right steps to monitor for and prevent such activities.

Click through for a table which shows common exfiltration scenarios and things you can do to reduce the risk of exfiltration. With access, though, there’s always going to be a risk of exfiltration: even in a SCIF, you can get away with shoving records into your pants if you’re famous enough.

Comments closed

KQL Parse

Published 2022-08-04 by Kevin Feasel

Robert Cain continues a series on KQL:

The previous post in this series Fun With KQL – Extract, showed how we can use the extract operator to pull part of a string using regular expressions. I think you’d agree though, using regular expressions can be a bit tricky.
If you have a string that is well formatted with recurring text you can count on, and want to pull one or more strings from it into their own columns, Kusto provides a much easier to use operator: parse.

Robert includes a series of examples, including examples of things you cannot do.

Comments closed

Data Mesh at Netflix

Published 2022-08-04 by Kevin Feasel

Bo Lei, et al, describe their Data Mesh architecture:

Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.

Click through for a high-level overview of the architecture.

Comments closed

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Published 2022-08-04 by Kevin Feasel

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….
Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

Monitoring Log Shipping with T-SQL

Published 2022-08-04 by Kevin Feasel

Lori Brown tracks log shipping operations:

For Log Shipping, some information is only available on the primary or only on the secondary. That means that I had to set up a linked server on the primary to connect to the secondary. I do not want to create any OPENROWSET queries for this since that would require that AdHoc Distributed Queries be enabled. I am not a fan of opening that up for the following reasons:
1) It can allow buffer overflow bugs to compromise systems.
2) It can allow a compromised server to connect to a non-compromised server.

Read on to see what Lori prefers instead.

Comments closed

Save and Unsafe Simple Parameterization

Published 2022-08-04 by Kevin Feasel

Paul White puts on the safety glasses:

When a statement passes the earlier parser and decoder checks, it arrives at the trivial plan stage as a prepared (parameterized) statement. The query processor now needs to decide if the parameterization attempt is safe.
Parameterization is considered safe if the query processor would generate the same plan for all possible future parameter values. This might seem like a complex determination to make, but SQL Server takes a practical approach.

Read on to learn more about the process.

Comments closed

Concurrency Control and VACUUM in Postgres

Published 2022-08-04 by Kevin Feasel

Paul Randal explains how multi-version concurrency control works in Postgres:

PostgreSQL uses an optimistic isolation system known as Multi-Version Concurrency Control (MVCC). MVCC ensures transactions writing data to the database don’t block concurrent transactions needing to read the data being modified. This works through the magic of row-versioning—PostgreSQL creates versions of rows in the database tables to minimize blocking from concurrent access. As more and more versions are generated, a garbage control mechanism called VACUUM must be used to ensure the tables are properly maintained. In this article, I’ll explain how all this works via a series of examples.

This is quite similar to Read Committed Snapshot Isolation in SQL Server but with a couple of twists, including the need to vacuum tuples.

Comments closed

From Azure Data Explorer to Excel

Published 2022-08-04 by Kevin Feasel

Dany Hoter views data in Excel:

In a previous article Direct Query from Excel to Azure Data Explorer (microsoft.com) I described a way to mimic Direct Query access ala Power BI in Excel.
The method used in this article that allows the user to filter the imported data using values entered into cells in the grid.
In this article I would like to describe a way to really query Kusto data in real time without importing any data and without any volume limitations.

Read on to see how, though there’s a pretty big intermediate step.

Comments closed

“Expensive” Queries

Published 2022-08-04 by Kevin Feasel

Erik Darling asks, what’s in a name?

When we talk about finding queries to tune, there’s an unfortunate term that gets thrown around: Expensive Queries.
Why is it unfortunate? Well, it reinforces the wrong mindset when it comes to query tuning, and leads people down the wrong path, looking at the wrong metrics.

I disagree on the “bad name” bit but agree on the substance. The term “expensive query” has a very useful connotation: this is a query which requires a significant amount of resources. Where I fully agree with Erik is that “query cost” from the optimizer does not do a great job of describing “significant amount of resources.” There is also a relevant point that expensive queries may not be the most important ones to look at. Reasons why can include:

The query runs at a time when there’s little load on the system, so it does not impact anybody else.
The query runs within acceptable performance boundaries for customers: it may take 10 minutes to run but it’s a batch process and the relevant business unit might only need it within an hour.
The amount of work that the query is doing is such that further optimizations are either not possible at all or they are only possible with a significant restructure that the business is unwilling to accept.

Even so, the term “expensive query” is still very useful. So is “expensive query relative to what it could be,” although we do tend to conflate the latter with the former. But now we’re getting deep into semantics and I forgot my waders.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts