Press "Enter" to skip to content

Month: September 2023

Row-Level Security Performance and Troubleshooting

Ben Johnston digs into row-level security:

There are two main areas where RLS can impact performance. The first is the user or authentication lookup. Some kind of lookup must be performed in the access predicate to determine either the user name, group membership, or specific values in the session context. Considering that RLS is non-prescriptive, the lookup isn’t confined to these methods, but they are very easy methods to use and implement and are standard based on implementations I’ve seen.

The second area is the authorization lookup. The authorization lookup, checking if a user has access to particular rows, can have a much bigger impact on performance. This is also in the access predicate. Following the basic rules for performance and keeping lookups simple goes a long way to minimizing the impact of RLS on performance. The goal is to keep performance levels as close as possible to a table without RLS. If indexes and predicates are correct, RLS can improve performance in some situations due to the automatic filtering that happens.

Read on for Ben’s thoughts on the topic.

Comments closed

SQL Shades for SSMS

Peter Schott prefers dark mode:

For those of us who’ve worked with SQL Server for some time, we’ve regularly used SQL Server Management Studio (SSMS). In recent years, we’ve seen an increase in websites and applications offering a Dark Mode. Azure Data Studio has one built in as it’s based on the VS Code engine. SSMS is a form of the full Visual Studio IDE and offers some ability to skin, but attempts in the past to make a true dark mode have been only partially successful. Microsoft has not given us that option in SSMS as it’s been more work than they can commit to with a broad platform. So our options have been partial dark modes, which leave big portions of the interface a bright white, or just use the defaults with everything being light.

Click through to see how SQL Shades does in Peter’s estimation. As for me, I’ll stick with my light mode.

Comments closed

Thoughts on Third-Party Power BI Tools

Chris Webb shares some thoughts:

Rather than blog about the tool itself – there’s no point repeating Nikola’s post – I thought it would be good to answer a question someone asked me later that day about Tabular Editor and which I’m definitely going to be asked about DAX Optimizer, namely:

This looks great, but it’s expensive and it’s hard for me to get sign-off to use third-party tools like this. Why doesn’t Microsoft give me something like this for free?

Chris shares his personal opinions on the matter. My opinion on it, as someone who has worked with Microsoft products for a long time and never for Microsoft, is that Microsoft needs to play a balancing act. They build products and tools with the intention of third parties extending them, whether by opening up APIs or creating an explicit extensions marketplace (like we see in Azure Data Studio and Visual Studio Code). If they go and take the best bits of these third party products, then that third party marketplace dries up quickly. On the other side of the coin, depending on third parties can’t always cut it. For example, Azure Data Studio used to have an awful execution plan viewer and the answer was “use SentryOne Plan Explorer instead.” That wasn’t a great solution either for Azure Data Studio (and today, I don’t know if the extension is even still around), so the pushback was firm: a good execution plan reader needs to be a core part of any first-class SQL Server developer tool from Microsoft.

Chris has plenty to say on the topic as well.

Comments closed

Finding Omitted Variables in Logistic Regression

John Mount picks up on a prior post:

For this note, let’s work out how to directly try and overcome the omitted variable bias by solving for the hidden or unobserved detailed data. We will work our example in R. We will derive some deep results out of a simple set-up. We show how to “un-marginalize” or “un-summarize” data.

This is an interesting dive into a common problem, and something which we can easily work around in linear regression, but not in logistic regression.

Comments closed

Manual Stats Updates in SQL Server

Matthew McGiffen can’t wait for SQL Server to update those stats:

Having up to date statistics is vital for getting the best performance out of your queries. Even though SQL Server automatically updates statistics in the background for you (When do statistics get updated?), you may find there are times when you want to manage updating them yourself.

Click through to learn several techniques for stats updates.

Comments closed

Reviewing Postgres SQL Statement History

Muhammad Ali does some performance testing:

pg_stat_statements is a built-in PostgreSQL extension that keeps track of SQL statements executed by the database server. It records details about query execution counts, total execution time and I/O related info. By capturing this information, database administrators can identify bottlenecks and optimize queries for better overall performance. In this blog, we’ll take a  look at how pg_stat_statements module can be utilized to enhance the performance of your PostgreSQL database.  

Read on to see how you can enable this extension, configure it, and use it to find the worst performers in the bunch.

Comments closed

Creating a Function in Snowflake

Kevin Wilkie creates a function:

Sometimes, you’ll need to create functions that do a particular thing. They’re not always pretty. Some of them look like they should have been thrown out with the bathwater.

Unlike SQL Server, user-defined functions in Snowflake can be done in several different languages. They can be done in Java, JavaScript, Python, Scala, or plain-old SQL.

Which means that Postgres users will be a bit more comfortable here than SQL Server users.

Comments closed

Choosing a Data Serialization Format

Rathish Kumar says more than “JSON and Parquet”:

In the world of software, we often work with different types of data like lists, tables, and more. These data structures are designed to be fast and efficient when our computer programs use them. However, sometimes we need to move this data out of our computer’s memory, like when we want to save it to a file or send it over the internet. To do this, we have to change the data into a special format made up of 0s and 1s, which is quite different from data structures. This process is what we call encoding or serialization. 

In this article, we’ll explore the world of encoding and decoding, which is the reverse process of turning that special format back into usable data. We’ll also take a look at different ways to do encoding and decoding, as well as important things to think about when choosing the right method for your software projects.

Sadly, ORC (Optimized Row Columnar) doesn’t make the cut, as Parquet ended up taking over that market.

Comments closed

A Primer on Database Constraints in MySQL

Robert Sheldon creates some keys:

MySQL supports six basic types of constraints for ensuring data integrity: PRIMARY KEYNOT NULLDEFAULTCHECKUNIQUE, and FOREIGN KEY. In this article, I introduce you to each constraint type and provide examples for how they work. The examples include a series of CREATE TABLE statements that demonstrate different ways to incorporate constraints into your table definitions. If you’re not familiar with the CREATE TABLE statement or how to create tables in a MySQL database, refer to an earlier article in this series that introduces you to the statement.

In short, they support the same set that SQL Server users are used to. But do read on to see the nuances behind each of these.

Comments closed

Don’t Try These with SQL MI and Private Endpoints

Zoran Rilak wraps up a series on Azure SQL Managed Instance and its support for private endpoints:

The first two installments of this mini-series discussed a couple of basic and advanced scenarios involving private endpoints. Today we’ll look at some ways private endpoints cannot be used to implement scenarios where one might expect otherwise.

Read on for four of these in total, laying out things you cannot do via private endpoint to a SQL Managed Instance. In fairness, Zoran also provides what I would consider reasonable work-arounds for each of those: have a VM jumpbox in the same virtual network for DAC connections, peer your virtual networks for replication, and so on.

Comments closed