Press "Enter" to skip to content

Curated SQL Posts

Hardware and Finger-Pointing

Glenn Berry gives us two rants for the price of one:

This is rant #1. Even though I have an unusual fascination with computer hardware, I am still somewhat taken aback when I encounter DBAs who have absolutely no idea what type of hardware they are using. I’ll sometimes ask a DBA “What processor does your most important database server have?”, and I often get a “deer in headlights” look in response. Then a mumbled response, “I’m not sure, maybe a Xeon?”.

Read on for this rant, as well as the origin story of Glenn’s outstanding SQL Server Diagnostic Information Queries.

Comments closed

The Joys of Replication, Example 48,106

Garry Bargsley troubleshoots a transactional replication issue:

Use transactional replication to replicate similar data from four databases to one Azure DB.  This sounds pretty straightforward for anyone who has done any replication work.

However, once I had everything setup and working, things stopped working, and it was a head-scratcher as to why.  I made the proper settings configurations on each article, or so I thought.  Let me show you the scenario in more detail.

Read on for the answer.

Comments closed

Power Query’s Equivalent of IN

Gilbert Quevauvilliers is making a list:

In the example I did not want to specify all the country names one row at a time. This not only takes a long time, but if I had to then make updates it could be painful too.

The requirement was for certain countries to have their names and the rest be grouped into “Other Countries”

Read on to see how Gilbert was able to combine the set of “Other Countries” together.

Comments closed

Indexes: Gotta Add ‘Em All

Andy Yun tells a tale of woe:

I once worked for an Entity Framework SaaS company that was having horrific T-SQL performance issues. We put out a major release one weekend and spent the subsequent week buried in horrible performance hell. Because all T-SQL code was generated from Entity Framework (stored procedures were banned by a development exec prior to me joining the company), there were no easy fixes that could be implemented (and a Production release typically took 12 – 36 hours, I wish I was joking).

The manager of infrastructure had heard about SQL Server’s missing index recommendations:

Yep, it ends about as poorly as you’d expect.

Comments closed

Ordered Columnstore Indexes

Joe Obbish and Erik Darling tag team on this one. First, Joe looks at the details of what the CCI ordering process does:

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication. 

And Erik has a messy work-around:

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.

Most of the problems that you’ll run into in SQL Server will come from sorting data.

Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

Definitely read both posts.

Comments closed

Mapping Functions in R with purrr

Ronan Harrington executes a function for each row in a dataframe:

In this section, we want to tidy the different types of flight in the data set by increasing the number of rows and decreasing the number of columns. For a given airport on a given day, instead of having multiple columns/variables for arrivals, departures and total number of flights, we want to have one column describing the flight type (e.g. arrival or departure) and one column with the value of that flight type/number of flights. This will give the data set a tidy structure.

Click through for the process and the script. H/T R-Bloggers.

Comments closed

The Benefits of CNAMEs

Deepthi Goguri wants a CNAME record:

If you are using the Servername and database name in the connection strings in all the applications connecting to your database being migrated, this process gets tough and tedious to update once the database is migrated to a different server. It is always advised to use database DNS in the connection strings instead of the IP address or the Servername.

This also applies to the report server datasources as well.

I know it is hard to make the change but if the change is good, it is worth the effort.

I think this starts to get tricky as the number of servers and instances increases, especially if your DNS records differ considerably from your server names. Still, I completely agree: using CNAMES makes life a lot easier.

Comments closed

Sources of Data Structure Truth

Deb Melkin performs database epistomology:

The “source of truth” is my newly made up phrase for whatever you are using to say this is my database schema and initial data needed to start up the application. This can be your script directory; this can be a dacpac or bacpac; this can be your data model; this can be a combination of these things. My go-to “source of truth” right now is my source control repository. I’ve got both the schema and the default data needed in the same location. In the past, I would have probably included the data model as way to help me make sure whatever database table changes I have in my source control are there, especially for that one database which only had tables and views. (A different rant for a different time.) Whatever you use, it absolutely CANNOT be an actual database. There are two main reasons for this:

Read on for those reasons.

Comments closed

Finding Sample Data Online

Mara Pereira goes searching for data:

Have you ever struggled to find sample data to play with in Power BI?

Did you spend hours (sounds crazy, but it happened to me too!) just looking for a dataset with insurance data? Healthcare data? Housing prices data?

Did you ever wonder “where are people finding the data to create those Netflix and Amazon reports that seem to be everywhere these days? Seriously, tell me your secret!”?

Click through for three good sites. Another one I’ve taken to is the US Bureau of Labor Statistics, which is the federal department responsible for tracking things like employment data, consumer prices, and compensation data. The plus side to these datasets is that you get the whole range of data cleanup, warehousing, querying, and analytics over data which is both real and fairly interesting. For a broader take, data.gov has open source data hosted by the US federal government, though I haven’t spent much time working with it.

Comments closed