January 2019 – Page 3

On Disabling Indexes

Published 2019-01-29 by Kevin Feasel

Kenneth Fisher has a few notes on disabling indexes:

Indexes are probably the number one tool we have to improve performance. That said, there are times when we want to put that index on hold. While indexes dramatically improve read performance they do cause a slight dip in write performance. This isn’t significant most of the time but when doing a large load it can frequently be faster to get rid of the existing indexes and then put them back when you are done.

I don’t think that I’ve ever regularly disabled indexes, even during bulk loading. It’s good to know that the option exists, however.

Comments closed

The Importance Of Index Column Order

Published 2019-01-29 by Kevin Feasel

Erik Darling shows us how arranging columns in an index can make a huge difference in query performance:

A while back I promised I’d write about what allows SQL Server to perform two seeks rather than a seek with a residual predicate.
More recently, a post touched a bit on predicate selectivity in index design, and how missing index requests don’t factor that in when requesting indexes.
This post should tie the two together a bit. Maybe. Hopefully. We’ll see where it goes, eh?

Also apropos: missing index hints return results in alphabetical order, not in selectivity order or what would be best for queries. In other words, just because the green text in SSMS says it’s the index you want doesn’t mean it’s the index you need.

Comments closed

The Value Of Central Management Server

Published 2019-01-29 by Kevin Feasel

Jeff Iannucci explains the benefits of Central Management Servers:

If you have more than a handful of instances, you really owe it to yourself to take a few minutes and set up your own CMS. The hard part is getting a list of all the instances, but you need to do that anyways, right?
Once you have that list pick ONE SQL Server instance to create your CMS. You want this to be something with solid up-time that is accessible to other team members, but not an instance that’s already spending CPU cycles on something critical.

The lack of CMS compatibility is one of my larger pain points with Azure Data Studio and an issue which hopefully gets fixed this year.

Comments closed

Authentication With Azure Managed Instances

Published 2019-01-29 by Kevin Feasel

Hamish Watson explains that Windows authentication is not available with Azure Managed Instances:

Azure SQL Managed Instances do not utilise windows authentication – so your two methods of authenticating applications and users are:
– SQL Authentication:This authentication method uses a username and password.
– Azure Active Directory Authentication:This authentication method uses identities managed by Azure Active Directory and is supported for managed and integrated domains. Use Active Directory authentication (integrated security) whenever possible.

Hamish also elaborates on some of the trickier bits about Azure Active Directory for someone used to on-prem AD solutions.

Comments closed

What To Know Before Integrating With Apache Kafka

Published 2019-01-28 by Kevin Feasel

Adi Polak gives us seven helpful tips to think about before building a Kafka cluster:

2 — You shouldn’t send large messages or payloads through Kafka
According to Apache Kafka, for better throughput, the max message size should be 10KB. If the messages are larger than this, it is better to check the alternatives or find a way to chop the message into smaller parts before writing to Kafka. Best practice to do so is using a message key to make sure all chopped messages will be written to the same partition.

Read the whole thing.

Comments closed

Spark And Splitting DataFrames

Published 2019-01-28 by Kevin Feasel

Giovanni Lanzani explains that one technique to split a data frame doesn’t quite work as expected:

Recently I was delivering a Spark course. One of the exercises asked the students to split a Spark DataFrame in two, non-overlapping, parts.
One of the students came up with a creative way to do so.
He started by adding a monotonically increasing ID column to the DataFrame. Spark has a built-in function for this, monotonically_increasing_id — you can find how to use it in the docs.

Read on to see how this didn’t quite work right, why it didn’t work as expected, and one alternative.

Comments closed

ASCII Block Text In Powershell

Published 2019-01-28 by Kevin Feasel

Roman Gelman shows how to use the Write-Ascii cmdlet to build ASCII art-style text in Powershell:

The Write-Ascii function from my PowerShell MS-Module module can draw custom text in the PowerShell console using ASCII fonts. This article’s cover was drawn by the Write-Ascii function too. The below code does the same, it writes end of line (EOL) character or emulates the Enter key.

Read on for examples and enjoy.

Comments closed

Generating Reports In Word Via Flow

Published 2019-01-28 by Kevin Feasel

Chris Webb has an example of taking a data set and generating a report in a Word document:

The idea is to loop through the rows in the Excel table and use the data on each row to populate the content controls in the template and then create a new Word document. Here’s a Flow that does this:

The steps are a bit convoluted, but they work. Chris mentions at the end why people might want to do this, and I’ll reiterate that: I’ve been in several discussions over the years where people want to embed data inside a document without manual intervention, and using tools like Reporting Services, that has not been pretty.

Comments closed

Powershell Core On Ubuntu Using Docker

Published 2019-01-28 by Kevin Feasel

Max Trinidad has an Ubuntu VM running Powershell Core on Docker:

While learning about Docker Container, I notice that is much easier to installed on a Linux system. In Windows, Hyper-V is a requirement to install Docker, and specially if you want to use the “Windows Subsystem in Linux” WSL feature, there’s more setup to complete. So, I’m not using Hyper-V, I’m using VMware Workstation. To keeping simple, I created an Ubuntu 18.04 VM using VMWare Workstation.
You can find the Docker CE installation instructions in the following link.
If you’re using Ubuntu 18.04. make sure to install Curl, as it isn’t included in the OS.

Click through for instructions on how to set this up and join the three layers club (which is not quite the three commas club but close).

Comments closed

Emulating PERCENTILE_DISC

Published 2019-01-28 by Kevin Feasel

Lukas Eder points out that there are two ways to use PERCENTILE_DISC() and most database platforms support only one:

Oracle has the most sophisticated implementation, which supports both the ordered set aggregate function, and the window function version:
– Aggregate function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x)
– Window function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x) OVER (PARTITION BY y)

But there are ways to calculate PERCENTILE_DISC() using a couple of window functions, so read the whole thing.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: January 2019