January 2020 – Page 13

For example, I want to prevent users from executing the same stored procedure at the same time. To demonstrate the scenario, I have created a stored procedure named procInsertEmployees, which inserts data into the tblEmployee table. I want to make sure that no one can access the stored procedure until the stored procedure inserts the data in the tblEmpoyee table. Moreover, I have added a waitfor delay ’00:00:15’ statement to simulate the blocking for the 15 seconds.

Application locks also allow for more complicated scenarios and you can get a bit creative when assigning resources (such as combining a process name and a parent ID).

Comments closed

Execution Plans: Check the First Operator

Published 2020-01-10 by Kevin Feasel

Grant Fritchey reminds us to look at the first operation when viewing an execution plan:

The first time you see a new execution plan that you’re examining to fix a performance problem, something broken, whatever, you should always start by looking at the first operator.
First Operator
The first operator is easily discerned (with an exception). It’s the very first thing you see in a graphical execution plan, at the top, on the left. It says SELECT in this case:

It’s easy to overlook, but Grant gives some good reasons not to do so.

Comments closed

Spark on Docker on YARN on Cloud

Published 2020-01-09 by Kevin Feasel

Adam Antal has included all of the layers:

Bringing your own libraries to run a Spark job on a shared YARN cluster can be a huge pain. In the past, you had to install the dependencies independently on each host or use different Python package management softwares. Nowadays Docker provides a much simpler way of packaging and managing dependencies so users can easily share a cluster without running into each other, or waiting for central IT to install packages on every node. Today, we are excited to announce the preview of Spark on Docker on YARN available on CDP DataCenter 1.0 release.

Joking about stack length aside, this looks really useful.

Comments closed

Optimal Kafka Partitioning

Published 2020-01-09 by Kevin Feasel

Paul Brebner is on a quest:

This blog provides an overview around the two fundamental concepts in Apache Kafka : Topics and Partitions. While developing and scaling our Anomalia Machina application we have discovered that distributed applications using Kafka and Cassandra clusters require careful tuning to achieve close to linear scalability, and critical variables included the number of Kafka topics and partitions. In this blog, we test that theory and answer questions like “What impact does increasing partitions have on throughput?” and “Is there an optimal number of partitions for a cluster to maximize write throughput?” And more!

Read on for some interesting findings.

Comments closed

Upgrading SQL Server Windows Docker Containers

Published 2020-01-09 by Kevin Feasel

Emanuele Meazzo shows how you can upgrade SQL Server if you are using a Windows Docker container instead of Linux:

With the 1st CU for SQL 2019 released just yesterday, and Microsoft updating the docker image right away, the only natural response for me was to update the docker instance that I showed you how to deploy a few months back.
In theory, a docker container can’t be really “updated”, they’re meant to be stateless machines that you spin up and down responding to changes in demand; what we’re technically doing is creating a new container, based on a new image, that has the same configuration and uses the same persistent storage as the old one.

Read on to see how you can perform this upgrade without losing your data.

Comments closed

Undercover Catalogue 0.4

Published 2020-01-09 by Kevin Feasel

David Fowler announces a new release of the Undercover Catalogue:

The first major change that 0.4.0 brings is centralisation. With previous versions of the Catalogue, it’s been a requirement to have the Catalogue schema and procs installed on every server that you want to monitor.
0.4 changes that, there is now no need to have anything installed on any of the target instances. Simply install the Catalogue in one place, on your central configuration server and add any instances that you require monitored to the Catalogue.ConfigInstances table.
This makes it much easier to add in instances to the Catalogue.

There are a few other updates as well, so check them out.

Comments closed

Partitioning on Columnstore Table Loading

Published 2020-01-09 by Kevin Feasel

Aaron Bertrand continues a series around learning about columnstore indexes:

In part 1, I showed how both page and columnstore compression could reduce the size of a 1TB table by 80% or more. While I was impressed I could shrink a table from 1TB to 50GB, I wasn’t very happy with the amount of time it took (anywhere from 2 to 14 hours). With some tips graciously borrowed from folks like Joe Obbish, Lonny Niederstadt, Niko Neugebauer, and others, in this post I will try to make some changes to my original attempt to get better load performance. Since the regular columnstore index didn’t compress better than page compression on this data set, and took 13 hours longer to get there, I’ll focus solely on the more advanced solution using COLUMNSTORE_ARCHIVE compression.

Click through for part 2.

Comments closed

Accessing S3 Data from Apache Spark

Published 2020-01-08 by Kevin Feasel

Divyansh Jain shows how we can connect to AWS’s S3 using Apache Spark:

Now, coming to the actual topic that how to read data from S3 bucket to Spark. Well, it is not very easy to read S3 bucket by just adding Spark-core dependencies to your Spark project and use spark.read to read you data from S3 Bucket.
So, to read data from an S3, below are the steps to be followed:

This isn’t a built-in source, so there is a little bit of work to do, but it’s not that bad.

Comments closed

Creating a Custom Partitioner for Apache Kafka

Published 2020-01-08 by Kevin Feasel

Swapnil Gosavi walks us through the process of creating a custom partitioner in Apache Kafka:

Assume we are collecting data from different departments. All the departments are sending data to a single topic named department. I planned five partitions for the topic. But, I want two partitions dedicated to a specific department, named IT, and the remaining three partitions for the rest of the departments. How would you achieve this?
You can solve this requirement, and any other type of partitioning needs by implementing a custom partitioner.

This is quite useful when you don’t necessarily want topic explosion but you do want more than what the classic partitioner allows.

Comments closed

Delegating Authentication using Managed Service Accounts

Published 2020-01-08 by Kevin Feasel

Jamie Wick helps us solve the classic Kerberos double-hop problem:

If the Report Server service doesn’t have permission to delegate to the SQL Server, it will try to connect anonymously (step 4 in the diagram above). Which results in this login error:
Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. Reason: Could not find a login matching the name provided. [CLIENT: <Client IP Address>]
Historically report server and SQL server services, that needed the ability to delegate authentication to other servers, were configured to run using an Active Directory user account. Enabling delegation on these accounts was simply a matter of setting the Trust level on the Delegation tab of the account’s properties (with Active Directory Users & Computers).

But Jamie is here to show us a better way.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Month: January 2020

Working with App Locks in SQL Server

Execution Plans: Check the First Operator

Spark on Docker on YARN on Cloud

Optimal Kafka Partitioning

Upgrading SQL Server Windows Docker Containers

Undercover Catalogue 0.4

Partitioning on Columnstore Table Loading

Accessing S3 Data from Apache Spark

Creating a Custom Partitioner for Apache Kafka

Delegating Authentication using Managed Service Accounts