Press "Enter" to skip to content

Category: Uncategorized

Quantifier Predicates in SQL

Joe Celko takes us through quantifier predicates:

SQL is based on set theory and logic. The only bad news is that many programmers have never had a class on either of those topics. They muddle through using the Boolean operators in their programming language and think that’s all there is to formal logic.

Let’s flashback to the early days of logic and play catch up. We need to start with syllogisms. Syllogisms are logical forms made up of combinations of two statements about classes of things that lead to a conclusion. They were invented by the Greeks and written up by Aristotle in Prior Analytics. You might have run into them, If you had a philosophy class that included lectures on formal fallacies. The three forms of statements allowed are:

Click through to receive a brief primer on formal logic and learn more about how SQL implements these concepts.

Comments closed

Data Processing in Data Explorer Pools

Tsuyoshi Matsuzaki shows us how Data Explorer pools work in Azure Synapse Analytics:

In Microsoft Ignite 2021, new Data Explorer (DX) pool in Azure Synapse Analytics is released in preview. You might wonder which one to choose among 3 different analytical pools – Spark pool, Dedicated SQL pool, and DX pool.

In this post, I’ll briefly summarize how data is processed in Data Explorer (Kusto) – Azure Data Explorer (ADX) and Azure Synapse Data Explorer (DX) pool.
I hope this will give you a hint for your optimal analytical platform.

Read on for this explanation.

Comments closed

Updating SQL Server Container Memory Limits

Andrew Pruski doesn’t have time to restart containers:

When running multiple SQL Server containers on a Docker host we should always be setting CPU and Memory limits for each container (see the flags for memory and cpus here). This helps prevent the whole noisy neighbour situation, where one container takes all the host’s resources and staves the other containers.

But what if we forget to set those limits? Well, no worries…we can update them on the fly!

Click through to see how you can change the memory limits on a running container.

Comments closed

AWS RDS for SQL Server Notes and Limitations

Tom Collins summarizes places where AWS Relational Database Services (RDS) for SQL Server differs from the box product:

Some AWS  RDS SQL Server limitations

– Some ports are reserved for Amazon RDS, and you can’t use them when you create a DB instance.

– Amazon RDS for SQL Server doesn’t support importing data into the msdb database.

– You can’t rename databases on a DB instance in a SQL Server Multi-AZ deployment.

–  AWS RDS doesn’t support Data Quality Services and Master Data Services on the same RDS service, need to spin up an EC2 and run services form another server

Read on to see more limitations, notes on how security is different, and notes on feature support. Do keep in mind, though, that some of these may change over time—a few years back, the number of limitations was much greater.

Comments closed

Row-Level Security in Serverless SQL Pools

Jovan Popovic has a method of implementing poor man’s row-level security in Azure Synapse Analytics serverless SQL pools:

Serverless Synapse SQL pools enable you to read Parquet/CSV files or Cosmos DB collections and return their content as a set of rows. In some scenarios, you would need to ensure that a reader cannot access some rows in the underlying data source. This way, you are limiting the result set that will be returned to the users based on some security rules. In this scenario, called Row-level security, you would like to return a subset of data depending on the reader’s identity or role.

Row-level security is supported in dedicated SQL pools, but it is not supported in serverless pools (you can propose this feature in Azure feedback site). In some cases, you can implement your own custom row-level security rules using standard T-SQL code.

Read on to see how.

Comments closed

Selecting Columns Based on Data Types in Power Query

Sandeep Pawar solves an interesting problem:

It’s not uncommon to have a table with hundreds of columns with different column types such as numerical, text, date, percentage etc. You can select columns by using the UI option “Choose Columns” and then selecting which columns to keep. But if you have have tens of columns of the same type, you will have to manually go through the list of columns and select the columns to keep. As far as I know, there is no direct way to only keep, for example, numerical columns. In Pandas, you can use select_dtypes method and pass which columns to choose based on the data types. No such option or function is available in Power Query.

In this example, I will show how I achieved it. You can customize it or turn it into a function based on your use case.

Read on for two classes of technique which do the trick.

Comments closed

Data Paging using Common Table Expressions

Steve Stedman takes us through one method of generating pages of data:

I can remember the first time that I worked on data paging code. I had to page through web site search results 20 at a time on a given web page. My task was to understand how it had been written and to do some bug fixing. After reviewing about 2,000 lines of code, and reviewing the seven different variables that were being used to know the current page, the next page, the previous page, the page size, the row at the top of the page, the row at the bottom of the page, and the number of pages, I finally understood what the code was intended to do. What a mess that was, but beyond the mess, the performance was horrible. The way the page worked was that based on the page you were on, all of the rows would be queried, then a loop would read through all of the rows before the current page, then loop through the rows on the current page displaying them on the page, and finally it would ignore the results after the current page. So page 1 was slow, page 2 was slower than page 1, page 3 was slower than page 2 and on and on.

Since that point I have implemented several different data paging algorithms myself, all better than the original implementation but none as elegant as the CTE way of doing data paging. I used to look at data paging as a painful task, but thanks to the SQL Server implementation of CTEs there is no more pain.

Steve also covers OFFSET and FETCH. This technique won’t be great with enormous data sets, but for moderate-sized data sets which query quickly, it works. This is one area which is quite painful, and the best (and wackiest) solution I’ve come up with in SQL Server when the initial query is quite expensive is to create tables with random names to store results and row numbers, populate a table the first time a query is run, and query that table on subsequent runs, using the RETURN value in a stored procedure to pass along the name of the table to access. Granted, that solution works best with static data and you’d want to have a method to clean up those tables after they’re no longer in use (like storing a list of those tables and their last access dates and times). So it’s a mess.

Comments closed

Memory-Optimized Table Variables and tempdb Contention

Erik Darling notes that memory-optimized table variables can be useful in specific circumstances:

First, yes, they do help relieve tempdb contention if you have code that executes under both high concurrency and frequency. And by high, I mean REALLY HIGH.

Like, Snoop Dogg high.

Because you can’t get rid of in memory stuff, I’m creating a separate database to test in.

Been there. When tempdb object creation causes massive contention, this certainly alleviates the stress.

As Erik notes, there are some tradeoffs to this, meaning that you have a real decision to make rather than simply using memory-optimized user-defined table types as a starting point.

Comments closed

Generating Alerts from Power Automate

Ed Hansberry shows how to create a Power Automate alert off of SQL Server data:

I’m going to show you how to do this in Power Automate with just a few steps. Let’s get started. In my example, I am going to return a table when a customer has placed an order where the order quantity will not divide evenly into the case pack. So if they order 100 units and the cases contain 24 each, I want to alert the order entry person to tell them the customer has effectively ordered 4.1667 cases, which isn’t allowed. They will need to order either 96 units or 120 units to get 4 or 5 cases.

Read on to see how.

Comments closed

Contrasting Data Warehouses with Power BI Dataflows

Reza Rad makes a comparison:

Dataflow is the data transformation service in Power BI, and also some other Power Platform services. Data Warehouse is the cloud storage and also compute engine for data. I often get this question that: “Now that we have dataflow in Power BI, should we not use the Data warehouse? What are the differences? which is better? When to use what?” This article and video, explains answer to these questions.

I’m probably a bit lower on self-service BI compared to others. When I see something like Dataflows, it reminds me too much of a mess of Excel spreadsheets on shared drives. There’s a lot of relevant business knowledge embedded in those disbursed locations, and bringing it together becomes as much a forensic exercise as it is architectural.

Comments closed