Author: Kevin Feasel

The Benefits Of DML Triggers

Published 2018-09-11 by Kevin Feasel

Kendra Little tells us a tale of trigger value:

When I first edged my way into a Junior DBA-ish role, I worked with a complex application with many large databases. Customers loaded and configured data into a (mostly) OLTP style database, and then database was replicated to multiple other systems — some to publish data to and adserving platform, and some to transform the data for reporting.

Triggers were used extensively in these downstream systems to detect changes. It went like this:

Transactional replication publications were set up on the OLTP (ish) database.
Transactional replication subscriptions were set up on each downstream server. A dedicated database was used for replication articles on each instance.
After replication was initialized, DML triggers were created on each article in the subscriber database. For each modification, the trigger would insert a narrow row into a “delta” table related to that article.
The “delta” tables were in their own schema, and contained row identifiers, timestamp columns, and custom indexes for processing. This enabled batches to be efficiently pulled from these tables for processing into the related system.
Cleanup processes periodically pulled processed rows out of the delta tables in the background (and indexes were designed to prevent the cleanup process from fighting with data processing jobs and inserts)

Read the whole thing. There are some things that triggers can do easily which would be difficult to handle otherwise, but they can also be dangerous in the wrong hands.

Comments closed

Power BI Report Layouts

Published 2018-09-11 by Kevin Feasel

Jeanne Combrinck shows us an easy method of creating Power BI layouts:

PowerBI has this great functionality where you can go and download preset layouts which make your PowerBI reports stand out more. There is a nice trick to doing this yourself.

You can create layouts in PowerPoint and then save them as images. Then insert them into your PowerBI report as an image and send the image right to the back.

Click through for an example of this in action.

Comments closed

SQL Server On Ubuntu 18.04 LTS

Published 2018-09-10 by Kevin Feasel

Avanish Panchal shows how to get SQL Server on Linux running on Ubuntu 18.04 LTS:

April 26th, 2018 Ubuntu 18.04 released. A 7th long-term support release of the world’s most popular OSOS available in Desktop, Server, Cloud, and Core versions. Tried quite a bit to install SQL Server 2017 on 18.04, but couldn’t get luck. Investigated quite a bit and came across few reasons. Lot of workaround are mentioned across web, however wondering easiest and simplest way, always.

Potential reason till SQL Server for Linux 2017 CU9 package had dependencies on OPENSSL version(1.0.0) but Ubuntu 18.04 LTS comes with OpenSSL 1.1. Microsoft has update the SQL Server 2017 installation package so it can use libssl1.0.0 pkg. However still found issue while installing today (08-Sep-2018). Realised Microsoft SQL Server on Linux 2017 CU10 was updated but still few changes are required at OS level i.e. Ubuntu 18.04.

Read on for screen shots showing how to fix the problem.

Comments closed

The Value Of Auto-Created Statistics

Published 2018-09-10 by Kevin Feasel

Brent Ozar is here to praise statistics auto-creation:

Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.

I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.

But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em.

There are edge cases where statistics auto-creation isn’t the best thing, but for the great majority of cases, it is a big positive.

Comments closed

Testing AG Read-Only Routing

Published 2018-09-10 by Kevin Feasel

Jess Pomfret shows us how we can use dbatools to test Availability Group read-only routing:

The other part I needed to set up was read-only routing, this enables SQL Server to reroute those read only connections to the appropriate replica. You can also list the read only replicas by priority if you have multiple available or you can group them to enable load-balancing.

Although this seems to be setup correctly so that connections that specify their application intent of read only will be routed to the secondary node I wanted to prove it.

Read on to see how Jess is able to prove it.

Comments closed

Integrating Kafka Into A Data Scientist’s Workflow

Published 2018-09-07 by Kevin Feasel

Liz Bennett from Stitch Fix has a guest post on the Confluent blog:

Our main requirement for this new project was to build infrastructure that would be 100 percent self-service for our Data Scientists. In other words, my teammates and I would never be directly involved in the discovery, creation, configuration and management of the event data. Self-service would fix the primary shortcoming of our legacy event delivery system: manual administration that was performed by my team whenever a new dataset was born. This manual process hindered the productivity and access to event data for our Data Scientists. Meanwhile, fulfilling the requests of the Data Scientists hindered our own ability to improve the infrastructure. This scenario is exactly what the Data Platform Team strives to avoid. Building self-service tooling is the number one tenet of the Data Platform Team at Stitch Fix, so whatever we built to replace the old event infrastructure needed to be self-service for our Data Scientists. You can learn more about our philosophy in Jeff Magnusson’s post Engineers Shouldn’t Write ETL.

This is an architectural overview and a good read.

Comments closed

Databricks UDF Performance Testing

Published 2018-09-07 by Kevin Feasel

Tristan Robinson shares some performance comps for different Azure Databricks scenarios:

I’ve recently been spending quite a bit of time on the Azure Databricks platform, and while learning decided it was worth using it to experiment with some common data warehousing tasks in the form of data cleansing. As Databricks provides us with a platform to run a Spark environment on, it offers options to use cross-platform APIs that allow us to write code in Scala, Python, R, and SQL within the same notebook. As with most things in life, not everything is equal and there are potential differences in performance between them. In this blog, I will explain the tests I produced with the aim of outlining best practice for Databricks implementations for UDFs of this nature.

Scala is the native language for Spark – and without going into too much detail here, it will compile down faster to the JVM for processing. Under the hood, Python on the other hand provides a wrapper around the code but in reality is a Scala program telling the cluster what to do, and being transformed by Scala code. Converting these objects into a form Python can read is called serialisation / deserialisation, and its expensive, especially over time and across a distributed dataset. This most expensive scenario occurs through UDFs (functions) – the runtime process for which can be seen below. The overhead here is in (4) and (5) to read the data and write into JVM memory.

Click through for the results. Looks like Python barely beat out Scala for the #1 position, but Scala was a little faster than Python in-class (e.g., the Scala program with a Scala SQL UDF was a little bit faster than the Python equivalent).

Comments closed

Working With Dates And Times In T-SQL

Published 2018-09-07 by Kevin Feasel

Tomaz Kastrun walks us through various functions to work with dates and times in T-SQL:

Manipulating date and time in T-SQL is a daily and very common task that every DBA, SQL Developer, BI Developer and data scientist will come across. And over the years, I have accumulated many of the simple date or/and time manipulation combinations of different functions, that it is time, to put them together.

Don’t expect to find here anything you haven’t used or seen – especially, if you are a long time T-SQL developer. The point is to have a post, that will have a lot of examples on date and time manipulation on one place. And by no means, this is not the definite list, but should be quite substantial and the code on Github repository will be update.

The list will be updated on my Github, and therefore this blogpost might not include all. In all of the following examples I will be using function GETDATE() to get the current datetime, unless the examples will have stored dates. Therefore, some of the examples or screen-prints will be different from yours.

This mostly focuses on the DATETIME type rather than DATETIME2 or DATE, but there are a few TIME uses. Check out Tomaz’s repo for more.

Comments closed

SSMS 17.9 Released

Published 2018-09-07 by Kevin Feasel

Alan Yu announces a new version of SQL Server Management Studio:

SSMS 17.9 provides support for almost all feature areas on SQL Server 2008 through the latest SQL Server 2017, which is now generally available.

In addition to enhancements and bug fixes, SSMS 17.9 comes with several new features:

ShowPlan improvements

Azure SQL support for vCore SKUs

Bug Fixes

View the Release Notes for more information.

It looks like the big push for this release was bug fixes, and there are quite a few of them.

Comments closed

Quartiles In DAX

Published 2018-09-07 by Kevin Feasel

Dustin Ryan shows us how to calculate quartiles using DAX:

To calculate the quartile, we’re going to use the PERCENTILEX.INC DAX function. The PERCENTILEX.INC function returns the number at the specified percentile. So for example, if I had numbers 0 and 100 in my data set, the 25th percentile value would be 25. The 50th percentile value would be 50 and the 75th percentile value would be 75, and you can figure out what the 100th percentile value would be.

Dustin shares an example with his NFL data set and also walks us through a couple of tricky situations.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30