Press "Enter" to skip to content

Curated SQL Posts

Preconceived Notions on Rules

Richard Swinbank is a rule-breaker:

When I started out writing T-SQL, I learned rules – things I should do, things to avoid – from other, more experienced developers. For example, when writing stored procedures:

– don’t use dynamic SQL

– don’t set variables wih SELECT

– never use WITH RECOMPILE

There are often great reasons for these. Dynamic SQL can leave you prone to SQL injectionSELECTing into variables can have unexpected effects, and RECOMPILE can bring an OLTP system to its knees.

I think most rules of this sort tend to fit into something like four categories: convenience, take caution, observe in the breech, or shibboleths. Some rules, like formatting rules, are for convenience, whether that be for yourself or others. Not using dynamic SQL is a cautionary rule, for the reasons Richard mentions. The final two categories are variants on rules which exist to show that you are one of us, with observation in the breech being rules that we say we follow but only do when others are looking, and shibboleths being rules we follow regardless of whether people are looking. Formatting might fit into this as well, or having rules prohibiting (or enforcing) use of subqueries in the SELECT clause.

Very rarely are prohibitional rules 100% applicable, though as Richard notes, rules are a very useful shorthand.

Comments closed

Preconceived Notions around Temp Tables

Two overlapping posts on temp tables? Order up! First, Deb Melkin realizes tempdb is like a hotel room:

I was always taught to clean up after myself so I thought dropping my local temp tables after creating them in a proc did that. I knew that once the session was dropped, the table no longer was there so SQL Server would eventually take care of it. But then you had the cases where you ran the same stored proc multiple times in the same session so you wanted to make sure the table was gone at the end of your procedure to prevent issues.

Second, Deepthi Goguri learns the importance of getting temp tables right the first time:

I always had an opinion of changing the schema of the temp tables or adding indexes later after the temp table creation doesn’t affect in any way but it’s actually not a good idea. Yes, you heard that right. Altering or adding indexes later after temp table creation within the stored procedures will be marked for deletion and so the cached temp table metadata cannot be reused. It is suggestable to create the index as you create the temp table using the inline index syntax.

Both reference Pam Lahoud’s tempdb talk at EightKB (video), so check that out too.

Comments closed

Preconceived Notions of Business Logic in the Database

Mala Mahadevan discusses business logic:

There are many things on which I’ve changed my opinion in my journey as a data professional. For this blog post, I picked a common one – where business logic should reside. During my years as a production DBA – there was (and probably is) a hard-held belief that databases are meant for CRUD operations only and business logic belongs in the application/middle tier. This belief has its place – DBAs don’t like debugging code that is application-specific, or be tasked with why data looks a certain way or what code caused it. Business logic can also be very complex and get deeply embedded in one place (the database). I have believed in this and fought for this at most places I’ve worked as a DBA.

Over time though, my opinion has changed. 

Building on top of Mala’s point, people often conflate “business logic” with “data logic.” Rules governing the shape of data are as much data logic rules as they are business logic. The simplest cases include things like nullability, data sizes and lengths, and how data elements interact. It can also include ensuring the non-existence of invalid states through constraints. There can be practical reasons why we wouldn’t implement some piece of data logic in the database itself—for example, perhaps the particular data logic rule would be prohibitively expensive or becomes too complex to maintain as triggers—but in general, I’d estimate that people and firms under-specify their data logic in the database rather than over-specifying business logic.

One last bit that I’d hit on is that it’s okay to have overlapping logic in both, especially when it comes to data validation. We shouldn’t blithely accept inputs from client-side scripts—we evaluate it server-side and ensure that nobody has fiddled with the form and tried to create an invalid state. Then, when writing to the database, it can still make sense to check again for the exact same reason.

Comments closed

Streaming Data to Event Hubs via Kafka Connect and Debezium

Niels Berglund starts off a two-part sub-series within a series:

This post is the first of two looking at if and how we can stream data to Event Hubs from Debezium. Initially I had planned only one post covering this, but it turned out that the post would be too long, so therefore I split it in two.

It started with the post, How to Use Kafka Client with Azure Event Hubs. In that post, I looked at how the Kafka client can publish messages to – not only – Apache Kafka but also Azure Event Hubs. In the post, I said something like:

An interesting point here is that it is not only your Kafka applications that can publish to Event Hubs but any application that uses Kafka Client 1.0+, like Kafka Connect connectors!

Click through for the first part of this pairing.

Comments closed

Trying Automated ML in Azure ML

I continue a series on low-code machine learning with Azure ML:

Automated Machine Learning (AutoML) provides two distinct benefits. The first benefit is the one that AutoML providers tend to tout: you don’t need (much) machine learning experience to use them. According to the marketing, AutoML does all of the work and you sit back and enjoy the fruits of its labor.

I am nowhere near sold on this use case for AutoML. Yes, you can get answers in a few clicks, but to get good answers, you need a lot more knowledge of data processing and statistics than they let on. Feeding in garbage data will get you mediocre results.

Click through for the second benefit, which I think applies much better. Also for a step-by-step demonstration of how AutoML works.

Comments closed

Build a Sandbox for Testing PolyBase and Hadoop

Fernando Sibaja Araya has a step-by-step guide to building a Hadoop sandbox for testing PolyBase on SQL Server:

This guide will take you step by step into deploying a hadoop sandbox into Azure. You then will connect to the sandbox through SSH and tunnel all the required ports to your machine so you can access all the endpoints to execute hadoop queries from Polybase.

We will be deploying Hortonworks Data Platform Sandbox 2.6.4. This will be 1 VM running in azure and within this VM a docker container will have all the HDP services running.

Click through for the full set of instructions. I’m a little overjoyed that my blog snuck into the set of links and resources at the end.

Comments closed

Using Azure DevOps to Deploy Python Functions to Azure Function Apps

Rayis Imayev has a trick question for us:

Can I create a CI/CD pipeline to deploy Python Function to Azure Function App using Windows self-hosted Azure DevOps agent?

My short answer to this question is Yes and NoYes, you can use Windows self-hosted Azure DevOps agent to deploy Python function to the Linux based Azure Function App; and, No, you can’t use Windows self-hosted Azure DevOps agent to build Python code since it will require collection/compilation/build of all Python-depended libraries on a Linux OS platform.

Click through for the full answer.

Comments closed

Preconceived Notions: “Databases Are Easy”

Rob Farley takes us back to school:

At university I studied Computer Science, which felt like it was mostly about algorithms and paradigms. It covered how to approach particular kinds of problems, what languages suited what problems and why, and how to model things. The answer to a lot of things was “C’, whether it was a multiple choice question, or the question about which language would be used to solve something.

I skipped the database subject. Everyone said it was overly basic, easy marks, and not particularly interesting. I wasn’t interested in it. Not when there were subjects like Machine Learning where we’d implement genetic algorithms in LISP to find ways to battle other algorithms in solving the prisoner’s dilemma. Or the subject where we’d create creatures (in software) that would follow each other in a flocking motion around a cityscape. Everything I heard about databases was that they were largely of no importance.

In fairness, university database classes tend to fall into one of two categories: either mathematical forays into set theory or fluffy, school-of-business-friendly “Today we’re going to learn how to write the word SELECT. Next time, we’ll learn how to write the word FROM” types of courses, at least from what I’ve experienced.

Comments closed

Secondary and Tertiary Data Mesh Interfaces in Azure

Paul Andrew continues a series on implementing data mesh with Azure:

When thinking about our node edges in part 2 I also made the statement about a primary set of node interfaces. In my initial drawings I alluded to this then capturing what I’ve called the PaaS Plane, suggesting the Azure Resource type used.

Building on this understanding I want to cover off the remaining edge use cases by exploring the other interface types we will typically need for the nodes of our data mesh architecture.

This has been a rather informative series on a topic I knew very little about coming in.

Comments closed