Press "Enter" to skip to content

Month: August 2020

Transaction Modes in SQL Server

I have a video and blog post out:

What I want to do in today’s post is to cover the different sorts of transaction modes and get into the debate about whether you should use explicit transactions or rely on auto-committed transactions for data modification in SQL Server. This came from an interesting discussion at work, where some of the more recent database engineers were curious about our company policy around transaction modes and understanding the whys behind it. I didn’t come up with the policy, but my thinking isn’t too far off from the people who did.

But before I get too far off course, let’s briefly lay out some of the basics around transactions.

Read on for a good deal of info on the different transaction modes, including a bit on why implicit transactions (as opposed to autocommit transactions) are a bad thing in SQL Server.

Comments closed

Transaction Isolation Levels in SQL Server

Dan Jackson walks us through the different transaction isolation levels in SQL Server and what they mean for us:

We will start with a definition and then evolve it: the isolation level specifies how much one transaction must be protected from resource or data modifications made by other transactions.

Consider the case where user A is trying to read a list of products out of a table, meanwhile user B comes along and changes some of the product information in the table. As part of their same transaction, user A comes back to try and read the product table, but it has changed. Do you want user A to read the new information or not?

Isolation levels allow you to decide what would happen in scenarios like the one I’ve just described and so it should come as no surprise that they are described in terms of which concurrency side effects they allow.

Read on for a description of typically-undesirable side effects and the isolation levels which prevent them.

Comments closed

TOP and Ordering

Erik Darling is in the middle of a back-to-basics series on performance tuning:

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Read on for an example of what happens when that type of thing goes wrong.

Comments closed

So You Want to Fail Over a SQL Managed Instance

Danimir Ljepava takes us through user-initiated failover of SQL Managed Instances:

In August 2020, we have released a new feature user-initiated manual failover allowing to manually trigger a failover on SQL Managed Instance using PowerShell or CLI commands, or through invoking an API call.

Manually initiated failover on a managed instance will be an equivalent of the automated failover for high availability and software patches initiated automatically by the service. Manually invoking a failover on MI will help test end-to-end applications for fault resiliency on automatic failovers in case of planned or unplanned events before deploying to production. In addition to testing how failover impacts existing database sessions, it can also help verify if it changes the end-to-end performance due to changes in the network latency. In some cases if performance issues are encountered on SQL MI, manually invoking a failover to a new node can help mitigate the performance issue.

Read on to see how you can perform failover and how you can confirm that it worked.

Comments closed

Learning from a Hadoop Outage

Sandhya Ramu and Vasanth Rajamani have an after-action report:

For companies and organizations, failure tends to be far more illuminating than success and the lingering effects of a failure can be harmful if the team moves too quickly and does not resolve the issue in a thorough and transparent manner. We recently ran into a large incident that involved data loss in our big data ecosystem and by reflecting on our diagnosis and response, we hope that our learnings from an impactful incident in our big data ecosystem will be insightful.

Here’s what happened: roughly 2% of the machines across a handful of racks were inadvertently reimaged. This was caused by procedural gaps in our Hadoop infrastructure’s host life cycle management. Compounding our woes, the incident happened on our business-critical production cluster.

Read on to understand what happened and why. It’s a lesson in the importance of having a disaster recovery plan and testing it

Comments closed

An Introduction to Spark Streaming

Sarfaraz Hussain has started a series on Spark Streaming. The first post gives an introduction to the topic:

The philosophy behind the development of Structured Streaming is that,

We as end user should not have to reason about streaming”.

What that means is that we as end-user should only write batch like queries and its Spark’s job to figure out how to run it on a continuous stream of data and continuously update the result as new data flows in.

Sarfaraz then follows this up with a bit on the structure of a streaming query:

So, as new data comes in Spark breaks it into micro batches (based on the Processing Trigger) and processes it and writes it out to the Parquet file.

It is Spark’s job to figure out, whether the query we have written is executed on batch data or streaming data. Since, in this case, we are reading data from a Kafka topic, so Spark will automatically figure out how to run the query incrementally on the streaming data.

Check them both out.

Comments closed

Semantic Search with FileTable

Haroon Ashraf continues a series on semantic search with Windows and SQL Server:

The focus of the article is on comparing documents that can be stored on Windows File System in one respect and in the other respect their comparative analysis that can be performed with Semantic Search in SQL Server.

Additionally, the readers will learn how to store unstructured data by exploring File Table and creating MS Word documents on the fly (instantly) to be consumed by Semantic Search.

This part of the article is related to the use of Semantic Search on unstructured data for the extraction of basic level business-crucial information provided standard naming is in place.

Click through for the article.

Comments closed

DAX Patterns: Second Edition

Marco Russo announces a second edition of DAX Patterns:

Great news! Just one year after releasing the second edition of The Definitive Guide to DAX, we just published a new website, a new book, and a new collection of videos: the second edition of DAX Patterns!

DAX Patterns is a collection of patterns in DAX for Power BI, Analysis Services Tabular, and Power Pivot for Excel. The first edition of DAX Patterns dates back to the end of 2014, and it was based on Power Pivot for Excel. Since then, DAX has evolved with many useful features. Most importantly, Power BI hit the market, and the number of users adopting DAX grew at an exponential rate. When we published the first edition of this book, Power BI had not even been announced yet. Today, most DAX users create a Power BI solution. The new edition of DAX Patterns is thus based on the tool you love: Power BI.

The book looks to be quite useful, and you can get an idea if this content is right for you from the DAX Patterns website. What’s crazy is that they’re offering everything in the book on the website for free, but I’d suggest that if you pick up enough good info from the site, give back by buying a copy of the book or videos.

Comments closed

Stopping and Starting Virtual Machines in a Resource Group

Dennes Torres walks us through a script to stop or start all virtual machines in an Azure resource group:

Some tasks on azure are easier if we automate them. The Azure Portal provides us the cloud shell, which we can use for this kind of automation.

I was making some experiences with SQL Server Always On, so I created three VMs inside a resouce group. Every time I want to start some experiment I need to start all three VMs and, in the end, stop all three again.

Read on to see how Dennes is able to accomplish this.

Comments closed

SQL Serverless in Azure Synapse Analytics

James Serra talks to us about SQL serverless (presently known as SQL on-demand but I’m getting ahead of the marketing curve this time):

Querying data in ADLS Gen2 storage using T-SQL is made easy because of the OPENROWSET function with additional capabilities (check out the T-SQL that is supported). The currently supported file types in ADLS Gen2 that SQL-on-demand can use are Parquet, CSV, and JSON. ParquetDirect and CSV 2.0 add performance improvements (see Benchmarking Azure Synapse Analytics – SQL Serverless, using .NET Interactive). You can also query folders and multiple files and use file metadata in queries.

Read on to learn a lot more about its use cases.

Comments closed