Press "Enter" to skip to content

Curated SQL Posts

Automating tSQLt Tests

James Anderson shows how to integrate tSQLt tests as part of his ReadyRoll pipeline:

By default, ReadyRoll will ignore tSQLt objects, including our tests. We don’t want ReadyRoll to script out the tSQLt objects, but we do want it to script our tests. To set our filter we need to unload the project in VS and edit the project file. Add the following to the section named ReadyRoll Script Generation Section:

James’s series is really coming together at this point, so if you haven’t been reading, check out the links in his post.

Comments closed

T-SQL Tuesday 92 Roundup

Raul Gonzalez wraps up T-SQL Tuesday #92:

On July 2017’s event the proposed topic was aimed for all you to share those little secrets that made your tummy burn after pressing F5.

Since early in the morning I’ve been reading your posts which makes me very happy and feel the topic was certainly well accepted by the community.

In order of published date these are the posts that took part in this month’s event.

Click through to see the 17 entries this month.

Comments closed

Finding DAX Measures In Use

Matt Allington shows an easy way to enumerate the DAX measures in a Power Pivot workbook in Excel:

I have written articles before about how you can extract measures from a data model using DAX Studio and also using Power Pivot Utilities.  These are both excellent tools in their own right and I encourage you to read up on those previous articles to learn more about these tools. Today however I am going to share another way you can extract a list of measures from an Excel Power Pivot Workbook without needing to install either of these 2 (excellent) software products.  I often get called in to help people with their workbooks and sometimes they don’t have the right software installed for me to extract the list of measures (ie DAX Studio or PPU).  This article explains how to extract the measures quickly without installing anything.

Matt uses a simple SQL statement to pull measure data into an Excel table, making it easy to retain the set of measures.  There are some built-in documentation possibilities with this.

Comments closed

Filesystem Enumeration DMV

Erik Darling points out a new DMV in SQL Server 2017:

SQL Server 2017 RC1 dropping recently reminded me of a couple things I wanted to blog about finding in there. One that I thought was rather interesting is a new iTVF called dm_os_enumerate_filesystem. It looks like a partial replacement for xp_cmdshell in that, well, you can do the equivalent of running a dir command, with some filtering.

The addition of a search filter is particularly nice, since the dircommand isn’t exactly robust in that area. If you’ve ever wanted to filter on a date, well… That’s probably when PowerShell got invented.

If I run a simple call to the new function like so…

Click through to see Erik use the new function.

Comments closed

MERGE With Deletion

Kevin Wilkie shows an example of deleting data as part of a merge operation:

The last time we were together, we learned how to use the MERGE statement when we wanted to insert rows that didn’t exist and update rows that didn’t. This time we’re going to add onto that. We’re adding the seldom used, but delightfully potent – delete rows that no longer exist in the original table.

MERGE is an enticing but dangerous piece of syntax.  It looks so nice until you realize how many bugs and oddities there are in the command.

Comments closed

Chaining Exactly-Once Operations With Kafka

Ben Stopford shows how you can use Kafka to chain together services while maintaining exactly-once guarantees:

Any service-based architecture is itself a distributed system, a field renowned for being difficult, particularly when things go wrong. We have thought experiments like The Two Generals Problemand proofs like FLP which highlight that these systems are difficult to work with.

In practice we make compromises. We rely on timeouts. If one service calls another service and gets an error, or no response at all, it retries that call in the knowledge that it will get there in the end.

The problem is that retries can result in duplicate processing—which can cause very real problems. Taking a payment, twice, from someone’s account will lead to an incorrect balance. Adding duplicate tweets to a user’s feed will lead to a poor user experience.  The list goes on.

I just had a discussion at SQL Saturday Albany about this exact thing, and the pain of rolling your own solutions.

Comments closed

Spark Data Structures

Shubham Agarwal explains the difference between three Spark data structures:

DataFrame(DF) – 

DataFrame is an abstraction which gives a schema view of data. Which means it gives us a view of data as columns with column name and types info, We can think data in data frame like a table in the database.

Like RDD, execution in Dataframe too is lazy triggered.

Read on to learn more about Resilient Distributed Datasets, DataFrames, and DataSets.

Comments closed

Subscription Versus Assignment In Kafka

Paolo Patierno explains why you shouldn’t mix subscribe() and assign() in Kafka:

Another great advantage of consumers grouping is the rebalancing feature. When a consumer joins a group, if there are still enough partitions available (i.e. we haven’t reached the limit of one consumer per partition), a re-balancing starts and the partitions will be reassigned to the current consumers, plus the new one. In the same way, if a consumer leaves a group, the partitions will be reassigned to the remaining consumers.

What I have told so far it’s really true using the subscribe() method provided by the KafkaConsumerAPI. This method forces you to assign the consumer to a consumer group, setting the group.id property, because it’s needed for re-balancing. In any case, it’s not the consumer’s choice to decide the partitions it wants to read for. In general, the first consumer joins the group doing the assignment while other consumers join the group.

Read on to learn more.

Comments closed

Troubleshooting MSDTC

Jeff Mlakar has a troubleshooting guide for the Distributed Transaction Coordinator:

Microsoft Distributed Transaction Coordinator (MSDTC) is a Windows service that coordinates transactions spanning multiple databases. It is a transaction manager that allows applications to include several different sources of data in one transaction. MSDTC coordinates committing the distributed transaction across all servers enlisted in the transaction.

An example here would be a process on one machine calling a stored procedure on another machine which in requires data that changes together in a transaction.

The MSDTC service is running on each of the servers to manage the successful commit (or rollback) of the transaction across all servers enlisted in the transaction.

My MSDTC experience is generally negative, that this is more trouble than it’s worth.  But sometimes you’ve got to use it, and when you do, it’s nice to have the skinny on it.

Comments closed

Auto-Restarting Docker Containers

Andrew Pruski helps out those of us who can’t seem to let our containers go:

One of the problems that I’ve encountered since moving my Dev/QA departments to using SQL Server within containers is that the host machine is now a single point of failure.

Now there’s a whole bunch of posts I could write about this but the one point I want to bring up now is…having to start up all the containers after patching the host.

I know, I know…technically I shouldn’t bother. After patching and bouncing the host, the containers should all be blown away and new ones deployed. But this is the real world and sometimes people want to retain the container(s) that they’re working with.

It’s pretty easy to set up a restart policy, as Andrew shows.

Comments closed