Press "Enter" to skip to content

Curated SQL Posts

LISTAGG In Snowflake DB

Koen Verbeeck continues investigating Snowflake capabilities:

Since SQL Server 2017, you have the STRING_AGG function, which has almost the exact same syntax as its Snowflake counterpart. There are two minor differences:
– Snowflake has an optional DISTINCT
– SQL Server has a default ascending sorting. If you want another sorting, you can specify one in the WITHIN GROUP clause. In Snowflake, there is no guaranteed sorting unless you specify it (again in the WITHIN GROUP clause).

It looks like LISTAGG is the ANSI standard name, though SQL Server followed Postgres’s lead in calling their function STRING_AGG.

Comments closed

When Synchronous AG Secondaries Are Out Of Sync

David Fowler explains that just because an Availability Group is set up as synchronous, doesn’t mean you can never experience data loss on failover:

The primary replica is constantly monitoring the state of it’s secondaries. With the use of a continuous ping, the primary node always knows if the secondaries are up or down.

It’s when SQL detects that one of it’s synchronous replicas goes offline is when interesting things can happen.

So here’s the discussion that came up, if a synchronous replica goes offline for whatever reason, SQL won’t be able to commit any transactions and that means we can be confident that the secondary is up to date, right?

Read on to learn the answer. Which is “no.” But David explains why, so you should read that instead of just having me say it.

Comments closed

Straight Talk On Trace Flags

Pam Lahoud explains the purpose of trace flags and talks about a very important trace flag, 4199:

Some trace flags are used to enable enhanced debugging features such as additional logging, memory dumps etc. and are used only when you are working with Microsoft Support to provide additional data for troubleshooting. These trace flags are not ones you want to leave turned on in a production system as they may have a negative impact on your workload. An example of one of these flags would be TF 2551 which is used to trigger a filtered memory dump whenever there is an exception or assertion in the SQL Server process. These trace flags are only used for a short period of time and typically only at the recommendation of Microsoft Support, so they will likely always be around.

If you are a DBA and are not extremely familiar with trace flags, you really want to read this article.

Comments closed

DBAs Aren’t Going Away, DevOps + Automation Edition

Grant Fritchey argues that the DBA role is here to stay:

One of the reasons I love DevOps so much is because I’ve done it successfully. I’ve worked on teams that built fully automated deployment mechanisms to get code from Dev to Production. Further, we automated the creation of dev & test servers. We automated the creation of production servers too. We automated the heck out of everything.
And then they fired me…
Kidding.
When we started building our DevOps processes, I was supporting two development teams. As we got better at automating our work, I was supporting three teams. By the time we had fully automated all the various processes, I was supporting between five and seven teams at different levels.

To support Grant’s point, I’ve had a draft in my personal blog entitled “The Cloud is not Stealing Our Jobs” from May of 2017 that I never got around to finishing. Back in 2017, that was what was going to kill the DBA role.

The role has certainly changed over the years. I suppose if your definition of a DBA is someone who lays out indexes starting on certain drive sectors to take advantage of rotation speed on that single 5400 RPM spinning disk drive AND NOTHING ELSE, then your job might not be there. But that describes exactly zero people I have ever known in the industry.

Comments closed

Integrating Azure Data Studio With GitHub

Eduardo Pivaral shows how to use Azure Data Studio to push to a Git repository on GitHub:

There are a lot of source control applications and software, everyone has its pros and cons, but personally, I like to use GitHub, since it is free to use and since it was recently acquired by Microsoft, support for other products is easier (SQL Server for this case).

On this post, I will show you how to implement a source control for a database using GitHub and Azure Data Studio (ADS).

Click through for the step-by-step instructions.

Comments closed

Cloudera Data Platform

Alex Woodie reports on the new Cloudera’s business plan:

“Once we’ve delivered that and got past it, we then want to get to a second subsequent version, which you can start to upgrade and migrate to, and that will be the go-forward platform,” he said. “Obviously the key part of CDP is delivering not just the workloads you have today but new and intuitive experiences around key workloads such as data warehousing, data flow, the edge or streaming, AI and machine learning.”
The company also announced that CDH 5.x and 6.x and HDP 3.x will be supported through January 2022, which is in-line with previous guidance the company has given. This company believes that three years is plenty of time for customers to plan their migration paths from older CDH and HDP versions to the unified CDP product. Support for HDP 2.x will end before that time.

Also of interest: the integration of Hortonworks Data Flow into CDH and Cloudera Data Science Workbench into HDP.

Comments closed

Iterative Solutions To The Closest Match Problem

Itzik Ben-Gan has a follow-up article looking at row-by-row solutions to the closest match problem:

Last month, I covered a puzzle involving matching each row from one table with the closest match from another table. I got this puzzle from Karen Ly, a Jr. Fixed Income Analyst at RBC. I covered two main relational solutions that combined the APPLY operator with TOP-based subqueries. Solution 1 always had quadratic scaling. Solution 2 did quite well when provided with good supporting indexes, but without those indexes also had quadric scaling. In this article I cover iterative solutions, which despite being generally frowned upon by SQL pros, do provide much better scaling in our case even without optimal indexing.

Itzik has three separate solutions here, including one using the CLR.

Comments closed

Azure Data Studio, January Release

Alan Yu announces the January release of Azure Data Studio:

In previous versions of Azure Data Studio, when a user ran large queries, no results would appear in the results grid until the query could show all of the results. This was not a great experience for our users, thus we did some investigating to improve this experience. In the latest build of Azure Data Studio, users can now see results streamed in the results grid. This makes it a better experience since users can see the results quicker and interact with their data instead of being in a waiting state.

There are several enhancements this month, including Azure Active Directory support.

Comments closed

Naming Conventions In SQL Server

Phil Factor explains naming requirements for SQL Server and gives suggestions for conventions to follow:

There are no generally accepted standards for naming SQL objects. Although ISO/IEC 11179 has been referred to as a standard for naming, it actually only sets a standard for defining naming conventions. There is a sample standard in the ‘Naming principles’ document (ISO/IEC 11179-5), but this is merely an example of how a standard should be defined. However, it is quite close to a general good-practice in programming.

When naming a table, it is a good idea to use a collective name or ‘object class term’ for the entity if one exists ( such as Employee, Cost, Tree, component, member, audience, staff or faculty) but use the singular rather than the plural form where possible. For the sake of maintenance, use a consistent naming convention that is informative but brief. It helps greatly to start with a dictionary of the correct nouns and verbs associated with the application domain and use that. If it proves inadequate, then the team can build on it. If a data model has been created as part of the design phase, this dictionary should be an end-product of this work.

As Phil notes at the end, consistency is the most important virtue here. It’s hard to work with a database where you have tables named Employees, employee_dates, and tblFiredEmployee.

Comments closed