Press "Enter" to skip to content

Month: December 2022

Partial Database Projects

Olivier Van Steenlandt doesn’t get the whole cookie:

In this blog post, I will describe how you can get a database in source control partially. You might be wondering why you would do that. Well, let’s start by explaining the use case.

A couple of years ago, I was working for a company where a third-party vendor owned the OLTP system. At that point in time, we were not allowed to change any existing objects or create any new objects in the existing schemas. Though, we were required to be able to transfer the data from the OLTP system to the staging environment of our Data Warehouse. To do so, the third-party vendor created a schema in the database where we were allowed to create views and stored procedures to be able to get the data we needed.

Read on for an example of how this might work, as well as important database project settings you’ll want to change in that case.

Comments closed

Encryption and Regulatory Requirements

Matthew McGiffen talks regulation:

One of the reasons you may be considering encryption is due to the relevant data protection regulation: either because the regulation specifies that data should be encrypted or because of the large potential penalties where there is a data breach. Some US companies have been hit by fines in the hundreds of millions of dollars following data breaches, so we are talking large sums of money. In Europe the largest fines so far (under the GDPR) have been related to misuse of personal data or consent (750 million euros is the highest I am aware of), but there have been fines of up to 30 million euros for data breaches. In the case of a breach, you could also be sued by individuals whose data has been accessed or by class action.

Read on for more thoughts on the topic.

Comments closed

A Crash Course on Synapse Studio

Kevin Chant wants six minutes of your time:

In this post I want to do a six-minute crash course about Synapse Studio. I wanted to do this follow-up post for a couple of reasons.

First reason is because a while ago somebody who was fairly new to Azure Data Engineering Services mentioned that they thought a lot of my posts were for advanced users. So, I showed them a previous post which was a five-minute crash course about Synapse Studio.

Whilst showing them that post I realized that some of the screenshots were out of date. With this in mind I thought I would do an updated version of the crash course for Synapse Studio. Which also allows me to highlight where to find some features.

Start your timers and get reading.

Comments closed

The Benefits of Stacking Pull Requests

Vivian Qu explains why stacking pull requests can make sense:

Becoming proficient in version control and change management is a necessary part of any software engineer’s job. However, I think that basic proficiency alone is not sufficient to be truly effective when working on complex production-ready software with a team of engineers. Stacking pull requests (PRs) is a key skill that should be learned early in a junior engineer’s career.

Stacking PRs is an advanced git technique that allows an engineer to break down one large change into a series of dependent changes that can be turned into smaller pull requests and reviewed separately.

Read on to learn more. It’s a skill I definitely don’t have, so time to add that to my to-learn list.

Comments closed

Installing ML Services on SQL Server 2022

Tomaz Kastrun notices a change to the SQL Server installer:

Machine Learning Services and language extensions is available under Database Engine Services, and if you want to use any of these languages, check this feature. During the installation process, the R, Python or Java will not be installed (nor asked for permissions), but you will install your own runtime after the installation. This will bring you more convenience with the installation of different R/Python/Java runtimes.

Read on to see how you can install and work with languages like R, Python, and Java in SQL Server 2022.

Comments closed

The Purpose of Data Encryption

Matthew McGiffen thinks through the benefits of encryption:

On the face of it, this is a very obvious question with a very obvious answer. We want to prevent data from falling into the wrong hands. In practice, it gets a little more complicated.

Exactly what types of attacks do you wish to be protected against? It’s good if we make sure our data is encrypted where it is stored on the disk, but that doesn’t help us if an attacker gains direct access to write queries against the database. We might encrypt data held in columns, but does that still protect us if the unencrypted data is being passed back across the network to our application and an attacker is intercepting our network traffic?

I did a Ctrl-F for “compliance” and didn’t see anything. Nor for checking boxes to keep regulators off our backs. It seems Matthew is going for the good answers here.

Comments closed

Solving Common CALCULATE Filter Argument Errors

Marco Russo and Alberto Ferrari catalog some errors:

The expression contains columns from multiple tables, but only columns from a single table can be used in a True/False expression that is used as a table filter expression.

This error is seen when the predicate includes column references from more than one table. For example, if we need a measure that returns the sales made to customers living in the same country as the store, we could try to write the following measure:

Read on for several examples and solid guidance on how to resolve these common issues.

Comments closed

Roll Your Own Row-Level Security for the Serverless SQL Pool

Randheer Parmar wants row-level security:

Row Level Security is a very key requirement for most database or data lake applications. Most of the databases are having natively build row-level security but Synapse serverless SQL pool doesn’t support this inbuilt functionality. In this article, we will see how to implement it.

Row-level security has always seemed to me to be a great idea but not one I can implement because its performance cost is always too high.

Comments closed

Distinct Counts in KQL

Robert Cain continues a series on KQL:

In an earlier post in this series, Fun With KQL – Count, you saw how to use the count operator to count the number of rows in a dataset.

Then we learned about another operator, distinct, in the post Fun With KQL – Distinct. This showed how to get a list of distinct values from a table.

While we could combine these, it would be logical to have a single command that returns a distinct count in one operation. As you may have guessed by the title of this post, such an operator exists: dcount.

Read on to see how you can use dcount in queries, including how you can perform speed versus accuracy trade-offs.

Comments closed