2019-03-19 – Curated SQL

In particular, we’ll look at an example scenario that addresses Data Drift – where new information is added mid-stream and when that occurs the new table structure and new column values are created in Snowflake automatically.
To illustrate, let’s take HTTP web server logs generated by Apache web server (for example) as our main source of data. Here’s what a typical log line looks like:
150.47.54.136 - - [14/Jun/2014:10:30:19 -0400] "GET /department/outdoors/category/kids'%20golf%20clubs/product/Polar%20Loop%20Activity%20Tracker HTTP/1.1" 200 1026 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36"

Click through for the demonstration.

Comments closed

Database-First or Kafka-First for Event Streaming

Published 2019-03-19 by Kevin Feasel

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense:

Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more.
If you were to go Kafka-first… well, it isn’t impossible. But all those responsibilities now belong to you as a developer. And if you are thinking there may be multiple webservers handling user requests and passing them to Kafka, you have to solve fairly challenging problems.

Read the whole thing.

Comments closed

Tuning Azure SQL Database

Published 2019-03-19 by Kevin Feasel

Tim Radney walks us through some of the tools we have available to tune Azure SQL Databases:

Many instance-level items that you have been used to configuring on full installations are off limits. Some of these items include:
– Setting min and max server memory
– Enabling optimize for ad hoc workloads
– Changing cost threshold for parallelism
– Changing instance-level max degree of parallelism
– Optimizing tempdb with multiple data files
– Trace flags

Tim does point out workarounds for some of these and gives us the list of things which are possible, so check that out.

Comments closed

Calculating Weighted Averages in SQL

Published 2019-03-19 by Kevin Feasel

Lukas Eder shows how you can calculate weighted averages using SQL:

As can be seen, this schema is slightly denormalised as the number of lines per transaction are precalculated in the transactions.lines column. This will turn out to be quite useful for this calculation, but it isn’t strictly necessary.
Now, in the previously linked Stack Overflow question, a report was desired that would calculate:
– An aggregation of sums as provided by the line items
– An aggregation of averages as provided by the transactions

As Lukas points out, doing this in two queries is easy, but doing it in one is sublime.

Comments closed

Bad Idea Files: Cross-Server Temp Table Access

Published 2019-03-19 by Kevin Feasel

Kenneth Fisher explains how to shoot yourself in the foot:

So a few things here. I’m using a global temp table instead of a local one because it makes it easier to reference. Local temp tables aren’t listed in tempdb under their name while global ones are.
The first part of this message (the bit in black) is a warning basically telling us that if there is a temp table (a # at the front of the name) it’s going to ignore the multi part reference. In other words, you’re going to get this message any time you try to use a multi part name and a temp table. Linked server or not. The second part of the message (the bit in red) just tells us that there isn’t a temp table named ##DBList.

Kenneth finds a way, but I can’t think of a scenario where accessing a temp table on a different instance turned out to be a good idea.

Comments closed

Accessing Azure Event Hubs with Python

Published 2019-03-19 by Kevin Feasel

Neil Gelder shows us how you can write Python code to work with Azure Event Hubs:

I’ve supplied these two python scripts in my github repo at the following link.
First we need to open the install the relevant python libraries so you’ll need to issue the below pip command in whatever command tool you use, bash or cmd Prompt
pip install azure-eventhub

Check it out if you need pub-sub in Azure.

Comments closed

Desired State Configuration: Managed Object Format Files

Published 2019-03-19 by Kevin Feasel

Jess Pomfret explains what Managed Object Format (MOF) files are and why they’re useful for Desired State Configuration:

When I run this script I see the output in the screenshot below, a MOF file has been created in my output folder. Managed Object Format (MOF) files are used to describe Common Information Model (CIM) classes, these are industry standards which gives us flexibility in working with DSC. In DSC this is important as the MOF file is the artefact that will actually be used to configure our nodes. This MOF will be delivered to our target node and enacted by the Local Configuration Manager (LCM).
The LCM will be covered in more detail in a later post, but for now know that it can be configured to be in either ‘Push’ mode or ‘Pull’ mode. Pull mode is more complicated to set up but perhaps more appropriate for managing a large number of servers. For now, we will look at the ‘Push’ mode where we will deliver the MOF manually to the target node for the LCM to enact.

There are a lot of TLAs to watch out for within Desired State Configuration.

Comments closed

Against Hard-Coded Database Names In Queries

Published 2019-03-19 by Kevin Feasel

Kendra Little explains why hard-coding database names in your stored procedures or views is a bad idea:

I’m terrible at naming things. I recently wrote some quick code to reproduce a design problem and demonstrate several options for solutions, and later realized that I’d named my objects dbo.Foo, dbo.FooFoo, and dbo.Wat.
But I feel strongly about a few important principles about referring to objects that are already named in a relational database, specifically SQL Server and Azure SQL Database.
Most of the time, you should use a two part-name for objects in the current database. It’s important to know your database context. When referencing an object in the current database, you should not specify the database name in the reference.

Read the comments as well. I’m not as hard-set against three-part naming for cross-database queries but can understand the sentiment.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Day: March 19, 2019

Using the StreamSets Snowflake Destination

Database-First or Kafka-First for Event Streaming

Tuning Azure SQL Database

Calculating Weighted Averages in SQL

Bad Idea Files: Cross-Server Temp Table Access

Accessing Azure Event Hubs with Python

Desired State Configuration: Managed Object Format Files

Against Hard-Coded Database Names In Queries