Press "Enter" to skip to content

Curated SQL Posts

DTUs Or vCores For Azure SQL DB

Denny Cherry looks at a new Azure SQL Database announcement:

Today Microsoft has announced that there is a new way to buy Azure SQL DB. If DTUs aren’t making sense to you, you’ll be happy to know that you can now simply select how many vCores you want for your SQL DB workload.  Now this will still require that you have an understanding on your workload to use this new vCore based way to buy Azure SQL DB, but Cores are a concept that is easy for people to talk about and wrap their heads around. Now this new model is only in preview at the moment, but I’m guessing that it’ll be around for a while in preview, then it’ll go GA as this new model makes sense.

I definitely prefer this model, as IT departments already understand the idea, whereas DTUs were nebulous at best.

Comments closed

Encrypt Those Database Backups

David Fowler shows how easy it is to encrypt database backups in SQL Server:

We all go to great lengths to make sure that our databases are secure (or at least I really hope that we do), we make sure that only authorised users have access and then only to the databases and levels that they need.  We’re careful that all our passwords conform to crazy complexity rules.  We’re a pretty security minded lot aren’t we?

But what about our backups?  Most of the time they’re just sitting on, either a local disk on the server itself or some share somewhere.  Wherever they are, what happens if someone manages to get hold of our backup files?  They can easily just restore them onto a server where they have permissions and voila, all our super secret data is now theirs.

They could even just open the files with a hex editor read the data, it’s really not a difficult thing to do.

By default, SQL Server makes absolutely no effort to encrypt or in any way protect our backups.  That’s pretty bad in my mind.

If you’re on Standard Edition, this became available in 2014.  For Enterprise Edition, it was available earlier.  At this point, I don’t think there are good reasons for not encrypting backups on production systems.

Comments closed

Changing The SQL Server Port On Azure Container Services

Andrew Pruski shows how to change off of the default port for SQL Server when running Azure Container Services:

So, how do you do it when running SQL Server in Azure Container Services?

Well there’s a couple of options available.

The first one is to change the port that SQL is listening on in the container, open that port on the container, and direct to that port from the service.

The second one is to leave SQL Server listening on the default port and direct a non-default port to port 1433 from the service.

Read on to see Andrew try out both of these methods.

Comments closed

Azure Analysis Services Parallelism And Scale

Teo Lachev has a quick note on Azure Analysis Services and parallelism:

Anyone who’s done parallel programming knows that it’s not easy. Safe access and locks need to be used to synchronize access and protect shared resources. The AS xVelocity engine (aka Vertipaq) is a multi-threaded application that accesses RAM as a shared resource. As each thread competes for a global lock for memory allocation/deallocation, scalability decreases. Because processor cores reserve memory in chunks, called cache lines, additional synchronization is required when threads access memory location that are close to each other. Before SQL Server 2016 SP1, this was a serious issue with Tabular. As we’ve learned, starting with SQL Server 2016 SP1, Microsoft switched to using the Intel Threading Building Clocks (TBB) C++ library. Specifically, Tabular now uses the TBB scalable memory allocator. Mind you that TBB is not CPU-specific extensions, so any modern Intel CPU should get these benefits. “Analysis Services SP1 uses an Intel TBB-based scalable allocator that provides separate memory pools for every core. As the number of cores increases, the system can scale almost linearly” and “The Intel TBB-based scalable allocator is also expected to help mitigate performance problems due to heap fragmentation that have been shown to occur with the Windows Heap”. Further, starting with SP1, Tabular is NUMA aware with 4-node NUMA system but I don’t know if the AAS VMs are preconfigured for NUMA.

Teo doesn’t have any firm conclusions at this point, but his initial testing looks positive.

Comments closed

Why CHECKDB Repair Invalidates Replication

Paul Randal explains why running DBCC CHECKDB on a published article will cause subscriptions to become invalidated:

Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:

  • Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
  • Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
  • Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
  • Reevaluate your HA/DR strategy so you don’t have to use repair again in future

One question I’m often asked is about why replication can be broken by running repair.

Read on for the answer.

Comments closed

Troubleshooting Availability Group Performance Dips

Simon Su walks us through a customer scenario where transactions per second would drop several orders of magnitude for a second, and then jump back up to normal:

The “Transaction Delay” value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the “Transaction Delay” counter has the same spikes as the sudden drop of the “Transactions Created/Sec”. Its spikes indicate that at those time points the AG transactions have time delay during commits.  This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting.

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic?

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas.  We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc.  We expect to find something that has the same spike trend as the “Transaction Created/sec” or “Transaction Delay”. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful.

At the endpoint, there’s a reminder that you should keep up to date on patching systems.

Comments closed

Working With Jupyter Notebooks And Airflow On Hadoop

Mark Litwintschik shows us an interesting demonstration of running Jupyter Notebooks as well as automating tasks with Airflow on Hadoop:

The following will create a ~/airflow folder, setup a SQLite 3 database used to store Airflow’s state and configuration set via the Web UI, upgrade the configuration schema and create a folder for the Python-based jobs code Airflow will run.

$ cd ~
$ airflow initdb
$ airflow upgradedb
$ mkdir -p ~/airflow/dags

By default Presto’s Web UI, Spark’s Web UI and Airflow’s Web UI all use TCP port 8080. If you launch Presto after Spark then Presto will fail to start. If you start Spark after Presto then Presto will launch on 8080 and the Spark Master Server will take 8081 and keep trying higher ports until it finds one that is free. Spark will then pick an even higher port number for the Spark Worker Web UI. This overlap normally isn’t an issue as in a production setting these services would normally live on separate machines.

Read the whole thing.

Comments closed

How DynamoDB Indexing Works

Shubham Agarwal explains how indexing works within DynamoDB:

Global secondary index in DynamoDb – An index with a partition key and a sort key that can be different from the base table. A global secondary index is very helpful when you need to query your data without primary key.

  •  The primary key of a global secondary index can be partition key or composite  (partition key and sort key).

  • Global secondary indexes can be created at the same time that you create a table. You can also add a new global secondary index to an existing table, or delete an existing global secondary index

  • A global secondary index lets you query over the entire table, across all partitions.

  • The index partition key and sort key (if present) can be any base table attributes of type string, number, or binary.

  • With global secondary index queries or scans, you can only request the attributes that are projected into the index. DynamoDB will not fetch any attributes from the table.

  • There are no size restrictions for global secondary indexes.

Click through to learn more about these as well as local secondary indexes.

Comments closed

Auto-Generated Filtered Stats

Dmitry Piliugin shows one way that we can optimize a query whose main problem is poor row estimates:

A model variation is a new concept in the cardinality estimation framework 2014, that allows easily turn on and off some model assumptions and cardinality estimation algorithms. Model variations are based on a mechanism of pluggable heuristics and may be used in special cases. I think they are left for Microsoft support to be able to address some client’s CE issues pointwise.

Today we are going to view some interesting model variation, that creates filtered statistics on-the-fly. I should give a disclaimer here.

Warning: All the information below is presented for purely educational and curiosity purposes. This is completely undocumented and unsupported and should not ever be used in production systems unless Microsoft support will recommend you. More to the point, the usage of this model variation may affect the overall server performance in a negative way. This should be used for experiments and in the test environment only.

It’s interesting reading, though do heed that warning.  This also isn’t a quick operation (seeing as how the database engine is creating filtered statistics), so it’s not a first-best choice.  But worth keeping your back pocket.

Comments closed

Plotting From SQL Server Machine Learning Services

Marlon Ribunal has a quick demo showing how to generate a ggplot2 plot using SQL Server Machine Learning Services:

Let’s install the package if it hasn’t been installed yet. The easiest way to do that is to run RGUI.exe that came with your SQL Server 2017 In-Database Machine Learning installation. You can find it here:

C:\Program Files\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64

Take note that you need to run the executable as Administrator. Also, if you’ve installed the R engine prior to your SQL Server 2017 In-Database Machine Learning with R, you have to explicitly tell the R package installer where you want your package installed.

> install.packages("ggplot2", lib="C:\\Program Files\\Microsoft SQL Server\\MSSQL14.MSSQLSERVER\\R_SERVICES\\library", dep = TRUE)

dep = TRUE tells the installer to install dependencies. ggplot2 depends on a lot of other packages. You can check dependencies using MiniCRAN.

Another option for installation is to bootstrap install via T-SQL:  you can execute external scripts which run install.packages() directly rather than using RGUI, if that makes more sense with your deployment process.

Comments closed