Press "Enter" to skip to content

Month: July 2018

Alleviating tempdb Contention

Pam Lahoud has some advice for those with tempdb-heavy workloads:

TL;DR – Update to the latest CU, create multiple tempdb files, if you’re on SQL 2014 or earlier enable TF 1117 and 1118, if you’re on SQL 2016 enable TF 3427.

And now it’s time for everyone’s favorite SQL Server topic – tempdb! In this article, I’d like to cover some recent changes that you may not be aware of that can help alleviate some common performance issues for systems that have a very heavy tempdb workload. We’re going to cover three different scenarios here:

  1. Object allocation contention

  2. Metadata contention

  3. Auditing overhead (even if you don’t use auditing)

There’s some good information in here so don’t just say tl;dr.

Comments closed

Reading The Transaction Log

Nesha Maric shows us a couple of methods and a third-party tool for reading the SQL Server transaction log:

Update operations in SQL Server are not fully logged in the transaction log. Full before-and-after values, unfortunately don’t exist, only the delta of the change for that record. For example, SQL Server may show a change from “H” to “M” when the actual record that was changed was from “House” to “Mouse”. To piece together the full picture a process must be devised to manually reconstruct the history of changes, including the state of the record prior to update. This requires painstakingly re-constructing every record from the original insert to the final update, and everything in between.

BLOBs are another challenge when trying to use fn_dblog to read transaction history. BLOBs, when deleted, are never inserted into the transaction log. So examining the transaction log won’t provide information about its existence unless the original insert can be located. But only by combining these two pieces of data will you be able to recover a deleted BLOB file. This obviously requires that the original insert exists in the active/online portion of the transaction log, the only part accessible to fn_dblog. This may be problematic if the original insert was done some weeks, months or years earlier and the transaction log has been subsequently backed up or truncated

I’ve tried to avoid messing directly with the transaction log whenever possible, but there are scenarios where it’s the only place you have needed information.

Comments closed

When Multiple Missing Indexes Exist

Brent Ozar shows what happens when there are multiple missing indexes for a query:

SQL Server Management Studio only shows you the first missing index recommendation in a plan.

Not the best one. Not all of them. Just whichever one happens to show up first.

Using the public Stack Overflow database, I’ll run a simple query:

But that behavior isn’t the case for all tools; SQL Operations Studio is a bit different.

Comments closed

Problem With Merge Replication And FILESTREAM

Gianluca Sartori walks us through an error when combining merge replication with FILESTREAM:

I published tables with FILESTREAM data before, but it seems like there is a particular planetary alignment that triggers an error during the execution of the snapshot agent.

This unlikely combination consists in a merge article with a FILESTREAM column and two UNIQUE indexes on the ROWGUIDCOL column. Yes, I know that generally it does not make sense to have two indexes on the same column, but this happened to be one of the cases where it did, so we had a CLUSTERED PRIMARY KEY on the uniqueidentifier column decorated with the ROWGUIDCOL attribute and, on top, one more NONCLUSTERED UNIQUE index on the same column, backed by a UNIQUE constraint.

Setting up the publication does not throw any error, but generating the initial snapshot for the publication does:

Cannot create, drop, enable, or disable more than one constraint,
column, index, or trigger named 'ncMSmerge_conflict_TestMergeRep_DataStream'
in this context. Duplicate names are not allowed.

This is a rather specific confluence of events, so it probably won’t affect many people.  Still, it is a bug.

Comments closed

Enabling LDAP Authentication On Cassandra

Kurt Greaves shows off a new LDAP authenticator for Apache Cassandra:

The LDAPAuthenticator is implemented using JNDI, and authentication requests will be made by Cassandra to the LDAP server using the username and password provided by the client. At this time only plain text authentication is supported.

If you configure a service LDAP user in the ldap.properties file, on startup Cassandra will authenticate the service user and create a corresponding role in the system_auth.roles table. This service user will then be used for future authentication requests received from clients. Alternatively (not recommended), if you have anonymous access enabled for your LDAP server, the authenticator allows authentication without a service user configured. The service user will be configured as a superuser role in Cassandra, and you will need to log in as the service user to define permissions for other users once they have authenticated.

The authenticator itself is hosted on GitHub, so you can check out its repo too.

Comments closed

RStudio Integration With Databricks

Brian Dirking, et al, announce support between RStudio and the Databricks platform:

With Databricks RStudio Integration, both popular R packages for interacting with Apache Spark, SparkR or sparklyr can be used the inside the RStudio IDE on Databricks. When multiple users use a cluster, each creates a separate SparkR Context or sparklyr connection, but they are all talking to a single Databricks managed Spark application allowing unique opportunities for collaboration between users. Together, RStudio can take advantage of Databricks’ cluster management and Apache Spark to perform such as a massive model selection as noted in the figure below.

I like seeing this level of integration, especially from a language like R, which has historically been limited to operating on a single machine’s memory.

Comments closed

Auto-Scaling Amazon ElasticMapReduce

Brandon Schller gives us some tips on sizing and scaling ElasticMapReduce:

EMR scaling is more complex than simply adding or removing nodes from the cluster. One common misconception is that scaling in Amazon EMR works exactly like Amazon EC2 scaling. With EC2 scaling, you can add/remove nodes almost instantly and without worry, but EMR has more complexity to it, especially when scaling a cluster down. This is because important data or jobs could be running on your nodes.

To prevent data loss, Amazon EMR scaling ensures that your node has no running Apache Hadoop tasks or unique data that could be lost before removing your node. It is worth considering this decommissioning delay when resizing your EMR cluster. By understanding and accounting for how this process works, you can avoid issues that have plagued others, such as slow cluster resizes and inefficient automatic scaling policies.

If you’re using EMR today or think you might use it in the future, you should read this.

Comments closed

Using DAX With Reporting Services

David Stelfox gives us an example of using DAX when connecting SQL Server Reporting Services to a SQL Server Analysis Services Tabular model:

Tools like Power BI have changed reporting allowing power users to leverage tabular cubes to present information quicker and without the (perceived) need for developers. However, experience tells us many users still want data in tables with a myriad of formatting and display rules. Power BI is not quite there yet in terms of providing all this functionality in the same way that SSRS is. For me, SSRS’s great value and, at the same time its curse, is the sheer amount of customisation a developer can do. I have found that almost anything a business user demands in terms of formatting and display is possible.

But you have invested your time and money in a tabular SSAS model which plays nicely with Power BI but your users want SSRS reports so how to get to your data – using DAX, of course. Using EVALUATE, SUMMARIZECOLUMNS and SELECTCOLUMNS you can return data from a tabular model in a tabular format ready to be read as a dataset in SSRS.

It’s a good post and a good example.  The only quibble I have is in the motivating paragraph; Power BI and SQL Server Reporting Services have different end goals—Power BI isn’t (and I think never will be) a pixel-perfect report building product; it’s meant to be a dashboarding technology.  That quibble aside, the example is well worth checking out.

Comments closed

Where Columnar Databases Struggle

Teo Lachev makes a good point regarding columnar databases:

A large company uses the SAP HANA ERP system. Users requires real-time access to transactional data. To avoid performance degradation, SLT replication (trigger-based change data capture) replicates data to another SAP HANA system that is used solely for reporting. The problem is that the more detailed the report gets and the more columns it has, the slower it gets and SAP HANA throws out of memory exceptions.

SAP HANA is an in-memory columnar database like Tabular. So, it stores data in columns, not rows. Columnar databases are primarily designed for analytical reports which typically have a few columns (sales by customer, product, date), but can potentially aggregate large datasets. As the reporting grain lowers and more columns are added (order number, order line item, customer name, phone number, etc.), a columnar database has to cross-join more and more columns. This is not efficient and performance quickly degrades irrespective that storage is fast. SSAS Tabular and Power BI are no different. SAP HANA complicates the issue further by preventing direct access to tables and requiring “analytical” views that join tables and potentially nest other views.

Read the whole thing.  Teo has a great point:  there are trade-offs between different data platform technologies, and choosing the right one is important.

Comments closed

Azure Data Lake Store: Now On Blob Storage

James Serra announces Azure Data Lake Store Gen2:

Big news!  The next generation of Azure Data Lake Store (ADLS) has arrived.  See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage.  Gen2 is built on Blob storage.  By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs.  The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Comments closed