It is hard for experts to understand how frustrating the above is to a new
Ruser or to a part time
Ruser. It feels like any variation on the original code causes it to fail. None of the rules they have been taught anticipate this, or tell them how to get out of this situation.
This quickly leads to strong feelings of learned helplessness and anxiety.
Our rule for
dplyr::mutate()has been for some time:
Each column name used in a single mutate must appear only on the left-hand-side of a single assignment, or otherwise on the right-hand-side of any number of assignments (but never both sides, even if it is different assignments).
If you do data analysis with R, you’ve probably run into this before. I certainly have, and it’s nice to understand why this is the case.
In SQL Server, scaling out reads (i.e. using Active secondary replicas via AlwaysOn Availability Groups) is a lot easier than scaling out writes. So what are your options when you have a tremendous amount of writes that scaling up will not handle, no matter how big your server is? There are a number of options that allow you to write to many servers (instead of writing to one master server) that I’ll call distributed writes. Here are some ideas:
Peer-to-Peer transactional replication (or Multi-master replication) with SQL Server. See Peer-to-Peer – Transactional Replication
Merge replication in SQL Server. See Merge Replication
Create a messaging and queuing application in SQL Server Service Broker where all writes are placed on the queue and sent to different servers
Read on for more options and some additional thoughts around Cosmos DB. My first inclination would be to put Kafka in front of a distributed write system, but that’s my bias.
With strace up and running let’s turn on the trace flags to enable output for Instant File Initialization and create database that has a 100MB data file and a 100MB log file. Check out this post from Microsoft for more details on the trace flags. This database create code is straight from their post. I changed the model database’s data and log file sizes to 100MB each. Also, it’s important to note Instance File Initialization is only for data files, log files are zeroed out due to requirements for crash recovery. We’re going to see that in action in a bit…
Read the whole thing.
Let’s take a look at the most evil and uncorrectable pressure within SQL Server for the Columnstore Indexes – the Dictionary Pressure. For any final user without a bit of the internal knowledge of the Row Group sizes, Dictionaries & Pressures, it is always a huge surprise to find out that they can’t have those perfectly sized Row Groups with 1.048.576 rows.
Let us set up a table producing a good Dictionary Pressure and load 1.1 Million Rows into it, expecting to get just 2 Row Groups, while perfectly being aware that the result shall be quite different:
Niko’s dictionary pressure example is a good reason not to include textual columns on columnstore indexes.
Standard edition is limited to lesser of 4 sockets or 24 cores with a maximum memory of 128 GB plus a few truly Enterprise level features like Compression, Availability Groups, Partitioning etc are off limits. I would say most places would fall under this threshold for “Standard” but feel inferior to say they run “Standard”! I don’t, especially when money matters.
But, all kidding aside, most shops don’t even realize that they do not use any Enterprise features on 90% of their instances but pay Enterprise price anyway! If you don’t trust me, go check for yourself at your place – we did, on hundred’s of SQL Server instances! I painfully built the infrastructure to do this type of thing using PowerShell in seconds if not a few minutes, for scanning hundreds of servers/instances.
There’s a lot here, so if you’re thinking about downgrading in a post-2016 SP1 world, Jana’s post is a must-read. But even with the new features, there are still quite a few enterprise-level features that make it so I don’t want to live without Enterprise Edition.
In past months, we discovered some edge scenarios related to backup compression for TDE databases causing backups or restores to fail, hence our recommendations have been
Avoid using striped backups with TDE and backup compression.
If your database has virtual log files (VLFs) larger than 4GB then do not use backup compression with TDE for your log backups. If you don’t know what a VLF is, start here.
Avoid using WITH INIT for now when working with TDE and backup compression. Instead, use WITH FORMAT.
Avoid using backup checksum with TDE and backup compression
When you install a new version of SQL Server, you get new features – and sometimes, you’re not told about them. For example, when 2016’s TDE compression came out, nobody told you, “If you back up across multiple files, your backups might suddenly be compressed.” You didn’t know that you had a new thing to test – after all, I don’t know a lot of DBAs who have the time to test that the new version of SQL Server successfully performs restores. They restore their production databases into the new version, test a few things, and declare victory – but testing restores FROM the new version’s backups isn’t usually on that list.
Keep up to date on those patches.
In addition to business matters, Hal’s article highlights key drawbacks which need to be addressed around engineering as well as product supportability for the endeavor to be successful. The article affirms that bringing SQL Server to *nix platform is not the hardest task compare to the additional work that the journey would require. Indeed, in order for the project to be successful the team would have to:
- Bring other SQL Server inbox products such as SSAS along
- Implement platform specific features such as CLR, AGs, and much more
- Guarantee adequate performance and scalability
- Create new ecosystem around product support, engineering systems and more.
So throughout the course of the work, I have continued to question the path we have been on and if we succeed at the end or not. Every time I would go back, reread the article and every time I would come to the same favorable conclusion. But why?
Slava links to this article from December from the data platform team explaining what SQLPAL is. In the end, I think the benefits of this model will be much larger than SQL on Linux (which is itself large).