2019-08-06 – Curated SQL

Ingesting data into the Data Lake occurs in steps 1 and 2 in our architecture. Azure Data Factory (ADF) provides an excellent mechanism for loading data from source applications into a Data Lake stored in Azure Data Lake Store Gen2. In fact, Microsoft offers a template in the ADF Template gallery which provides a metadata driven approach for doing so. The template comes with a control table example in a SQL Server Database, a data source dataset and a data destination dataset. More on this template can be found here in the official documentation.

I appreciate that this is a full walkthrough of the process, not just one step.

Comments closed

Stream Processing with Kafka

Published 2019-08-06 by Kevin Feasel

Satish Sharma wraps up a series on Kafka and Kafka Streams:

Consider a hypothetical fleet management company that needs a dashboard to get the insight of its day to day activities related to vehicles. Each vehicle in this fleet management company is fitted with a GPS based geolocation emitter, which emits location data containing the following information
1. Vehicle Id: A unique id is given to each vehicle on registration with the company.
2. Latitude and Longitude: geolocation information of vehicle.
3. Availability: The value of this field signifies whether the vehicle is available to take a booking or not. Current Status (Online/Offline) denotes whether the vehicle is on duty or not.

Read through the article and then check out Satish’s GitHub repo for more.

Comments closed

The Cost of Abstraction

Published 2019-08-06 by Kevin Feasel

Erik Darling explains that abstraction can be the cause of performance woes in SQL Server:

There’s no “caching” of steps in a query. If you nest a view however-many-levels-deep, each step isn’t magically materialized.
Same goes for CTEs. If you string a bunch together and reference them multiple times, you’ll start to see some very repetitive branches in your query plans.
Now, there are tricks you can play to get what happens inside of one of these steps “fenced off”, but not to get the result set fully materialized.
It’s a logical separation, not a physical one.

In addition, as your query gets more and more complex, the optimizer eventually gives up and gives you what will likely be an ugly version of its implementation because there are too many potential solutions.

Comments closed

Using Show-Command with dbatools

Published 2019-08-06 by Kevin Feasel

Garry Bargsley shows how you can “build” a GUI for any Powershell cmdlet, using one from dbatools as an example:

The Show-Command cmdlet will display in a graphical interface all the parameter options related to a function, cmdlet or workflow.
I always recommend the book “Learn Windows PowerShell in a Month of Lunches” for people that ask me how to get started with PowerShell. It is a very thorough book and I believe helps build the foundation you need to user PowerShell to the fullest. Earlier this month I spent time going through the book for maybe the third time. I must have blown right by the Show-Command cmdlet which shows up in Chapter 4, as I do not ever remembering seeing it before. Then a couple of weeks ago Kenneth Fisher ( b | t ) post a blog about this command. I like how Kenneth related the Show-Command in PowerShell to generating scripts with SSMS. This is a great comparison for the DBA to understand. After reading his blog post I thought, wow, this might be a handy tool to combine with my favorite module out there dbatools.

Click through for a demonstration.

Comments closed

Testing Memory-Optimized tempdb

Published 2019-08-06 by Kevin Feasel

Erin Stellato wants to see how big a benefit using In-Memory OLTP for tempdb metadata objects really is:

I have SQL Server 2019 CTP 3.2 installed on my virtual machine, which has 8GB of memory (max server memory set to 6 GB) and 4 vCPUs. I created four (4) tempdb data files, each sized to 1GB.
I restored a copy of WideWorldImporters and then created three stored procedures (definitions below). Each stored procedure accepts a date input, and pushes all rows from Sales.Order and Sales.OrderLines for that date into the temporary object. In Sales.usp_OrderInfoTV the object is a table variable, in Sales.usp_OrderInfoTT the object is a temporary table defined via SELECT … INTO with a nonclustered added afterwards, and in Sales.usp_OrderInfoTTALT the object is a pre-defined temporary table which is then altered to have an additional column. After the data is added to the temporary object, there is a SELECT statement against the object that joins to the Sales.Customers table.

This isn’t a be-all, end-all performance test, but it does give us an idea of what can work better with memory-optimized tempdb metadata.

Comments closed

Against Feature Restrictions

Published 2019-08-06 by Kevin Feasel

Solomon Rutzky is not a fan of Feature Restrictions in SQL Server 2019 CTP 3.2:

However, nobody’s perfect and occasionally the SQL Server team makes a mistake. A new feature introduced in SQL Server 2019 CTP 3.2, “Feature Restrictions”, is an unfortunate example of such a mistake. It’s a tragically misguided attempt at improving security that not only increases the chances of SQL Injection, but it also prevented useful changes from being made. “Misguided” because it doesn’t even accomplish it’s stated goal, and “tragic” because it a) most likely increases the chances of SQL Injection, and b) used up the time that could have been spent on implementing useful changes.

Read on to see why. Some of it is the travails of blacklisting (for example, the product blacklists WAITFOR but there’s more than one way to force a wait), some is offsetting behavior (as you introduce risk-reducing products, people compensate by taking additional risks up to the point of the prior equilibrium), and some of it is additional information leakage.

Comments closed

Calculating Memory Consumption by Object

Published 2019-08-06 by Kevin Feasel

Max Vernon has a script to help you figure out which objects are consuming the most buffer pool space:

SQL Server caches object data in memory in the buffer pool. Understanding memory consumption by object can be crucial for performance. For instance, you may have a large logging table consuming 90% of the buffer pool. Moving older rows out of the logging table might allow you to reduce memory consumption if you really only care about the last 2 weeks of log records. The script below takes a snapshot of sys.dm_os_buffer_descriptors, then links it to the objects in each database on the instance. The output shows which objects are in memory, along with how much memory is being consumed by each object.

Click through for the script.

Comments closed

Measuring Closeness Centrality in Graphs

Published 2019-08-06 by Kevin Feasel

Niko Neugebauer explains the concept of Closeness Centrality:

The real center of the network or also known as The King of the Network, Closeness Centrality is a measure which represents the relative location of the Vertice to the center of the network, or better to say the average distance to all other Vertices within that network.
This measure results in the high effectiveness of information spreading/flow within the given network, because of the necessary number of Edges to cross to reach to any given connected Vertice.

Read on to see why this is useful and how you can calculate it in SQL Server 2019.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: August 6, 2019

Dimensional Load with Databricks

Stream Processing with Kafka

The Cost of Abstraction

Using Show-Command with dbatools

Testing Memory-Optimized tempdb

Against Feature Restrictions

Calculating Memory Consumption by Object

Measuring Closeness Centrality in Graphs