Press "Enter" to skip to content

Curated SQL Posts

Common VM Configuration Issues

Erik Darling takes us through some of the most common issues when setting up VMs to run SQL Server:

Everyone’s on VMs these days. That’s cool. Nothing against computers.

But here’s how people screw up SQL Server on VMs bigtime:

– Fewer than 4 cores (this is a licensing minimum)
– Multiple single-core sockets
– Not fully reserving memory
– Oversubscribing hosts

All four of those are bad and Erik explains why.

Comments closed

Azure SQL DB Serverless

John Morehouse takes us through the current public preview of Azure SQL Database Serverless:

Microsoft continues to expand the Azure ecosystem rapidly.  One of the most recent delivered improvements was Azure SQL Database Serverless option.    The new serverless model, which is currently in public preview, provides a compute tier for a single database that scales dynamically with the workload.   This new compute tier is optimal for single database that have unpredictable usage patterns.  Previously, you would provision the compute tier to your Azure SQL Database which allowed to have more granular control on scalability. The serverless tier scaling is effectively controlled by the service itself and will scale when needed.

This looks like it’d be good for a dev server, where the occasional startup cost is fine and there can be significant usage requirement differences based on time—after the devs go home, you’re not likely to need much compute.

Comments closed

Secure Strings in Powershell

Greg Moore shows how to build out secure strings in Powershell:

One of the more common problems I’ve faced with a particular client is setting up data extractions from outside sources such as SFTP servers. Since this client deals with PII data, it’s essential that this is done in as a secure manner as possible. For example, all connections need to be encrypted using protocols such as SFTP instead of FTP.

Securely connecting is particularly easy to accomplish if one uses a 3rd party module such as Posh-SSH. However, you are still stuck with the problem of logging into the remote SFTP server securely.

Keeping credentials secured is something which Powershell makes pretty easy. I can’t think of a reason why you’d ever need to keep credentials in plaintext here.

Comments closed

Estimated Execution Plans in Azure Data Studio

Dave Bland walks us through the “Explain” button in Azure Data Studio:

At first I was thinking maybe it is a short cut to help.  Nope, it is actually an easy way to get the estimated execution plan. The key is estimated execution plan. While in SQL Server Management Studio we have been able to easily get both the estimated or actual execution plan.

Just as a reminder, the actual execution plan requires the query to actually run. While the estimated plan will use statistics to generate the plan.  Generally, if the actual and estimated are capture very close to each other, you are more than likely not going to see a difference between the tow.

Now let’s talk about the Explain button.  It will return two things, the XML of the execution plan and a graphical representation of the execution plan.

There are a few differences between Azure Data Studio’s implementation of execution plans and SQL Server Management Studio’s.

Comments closed

The State of DevOps for Data Platform Professionals

Kendra Little summarizes the Accelerate: State of DevOps Report 2019 with a focus on what this means for data platform professionals:

While there are a ton of valuable insights in the report, in this post I will focus in on the findings which I believe are most relevant to those of us who work “close to a database.” There are three very interesting aspects of the research which hit close to home:

1. Speed and stability are not tradeoffs
2. Heavy change processes negatively impact speed and stability
3. Communities of practice are a common and successful tool to transform culture

Read on for Kendra’s detailed notes.

Comments closed

DBCC CHECKDB Error on Azure SQL Database

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Comments closed

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2:

We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 times for 15min each and the throughput* measured. The average number is taken from 3 tests to produce the final number. 

The post argues that there’s an improvement, but when a majority of cases end up worse (even if just a little bit), I’m not sure it’s much of an improvement.

Comments closed

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark:

When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000000.json. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as 000001.json, the following as 000002.json, and so on.

It’s interesting that they chose JSON instead of a binary transaction log like relational databases use.

Comments closed

Spark for .NET Developers

Ed Elliott has a long-form post covering spark-dotnet:

The .NET driver is made up of two parts, and the first part is a Java JAR file which is loaded by Spark and then runs the .NET application. The second part of the .NET driver runs in the process and acts as a proxy between the .NET code and .NET Java classes (from the JAR file) which then translate the requests into Java requests in the Java VM which hosts Spark.

The .NET driver is added to a .NET program using NuGet and ships both the .NET library as well as two Java jars. One jar is for Spark 2.3 and one for Spark 2.4, and you do need to use the correct one on your installed version of Scala.

As much as I’ve enjoyed his series, getting it in a single-post format is great.

Comments closed

Storytelling with Power BI: Consistency

Mark Lelijveld continues a series on storytelling with Power BI:

In the below report you can easily click on a country on the left side to move to another page. When it comes to interactivity it is all done right! On the right top you can also filter on order date. Let’s say we apply a filter to only see the sales up to the end of 2013. This results in a sales amount of nearly $ 319K.

Now, Germany gets my attention. I want to see more and decide to navigate to the other page by clicking on Germany. Ending up at the other page, I see that the sales amount changes back to $2.3M. In other words, my filter is gone!

Much of the difference between adequacy and excellence with visualization is in this kind of polish.

Comments closed