Month: September 2018

Hortonworks Data Analytics Studio

Published 2018-09-14 by Kevin Feasel

Will Xu and Syed Mahmood announce Hortonworks Data Analytics Studio:

DAS leverages open-source technologies such as Apache Hive to share and extend the value of a modern data architecture in heterogeneous environments. It helps infrastructure administrators manage and optimize the performance of their Hive workloads by delivering visibility into query patterns and storage hotspots. DAS improves performance by uncovering inhibitors to query speed as well as providing recommendations to improve its efficiency.

In the past, Hive view did not provide full auto-complete capability during authoring time. We’ve addressed this shortcoming in DAS. This is not a trivial task especially on large databases, however through a number of caching optimizations we were able to make it work smoothly even with thousands of tables.

This product feels more like Management Studio or SQL Operations Studio than prior Hive UIs. That’s definitely a good thing.

Comments closed

Clearing The SSAS Cache Using C#

Published 2018-09-14 by Kevin Feasel

Shabnam Watson shows us a small console program to clear the SQL Server Analysis Services cache:

First let me give you a little background of why you would want to clear SSAS cache from C# code when you can do this using an XMLA command from SSMS.

If you have a slow MDX/DAX SSAS query , you have a couple of options for improving the performance (assuming no hardware changes):

Rewrite the query differently if you have control over the query. (You will have two queries that you want to compare against the same database.)

Make changes to the SSAS database to follow a better design. (You will have one query to run against two databases)

Regardless of which route you go, you should compare the performance before and after the changes to see how much you gained from the change.

Click through for more, including the code.

Comments closed

Multiple Mentions Of A Column In An UPDATE Statement

Published 2018-09-14 by Kevin Feasel

Doug Lane walks us through various scenarios with updates including the same column multiple times:

An application developer came to me with this question recently: “Can I use the same column twice in a SQL UPDATE statement?”

Yes and no.

It depends on what you mean by “use”.

Read on to see what Doug means.

Comments closed

Azure Data Factory V2 Or SSIS?

Published 2018-09-14 by Kevin Feasel

Merrill Aldrich explains the differences between Azure Data Factory V2 and SQL Server Integration Services:

The next consideration is a bit more involved if you are new to data integration. Both of these tools excel at transporting data from place to place, but they have important differences in terms of what you can do to modify the data in transit. As a matter of emphasis, ADF has more features geared toward moving the data than performing any complex transformation along the way. SSIS, on the other hand, was built with a large library of transformations that you can chain together to make elaborate data flows including lookups, matching, splitting data, and more.

The tools also overlap quite a lot. In projects this seems to lead to the question of whether you’ll transform the data “in flight” using Extract Transform Load (ETL), or instead move the data to a destination where it’ll be transformed using Extract Load Transform (ELT).

These are not “pretty much the same thing” and Merrill does a good job of explaining what those differences in design mean for the products.

Comments closed

The Story Behind Grant’s Execution Plan Book

Published 2018-09-14 by Kevin Feasel

Grant Fritchey shares a funny (in a Schadenfreude sense) look at what it took to get SQL Server Execution Plans, 3rd Edition out:

Four years ago, after a bunch of dithering and some negotiations with Tony Davis, my editor, I started to update my book, SQL Server Execution Plans. We managed to convince Hugo Kornelis to be the tech editor. I started to do the real writing in early 2015.

I was most of the way through a first draft and no one liked it. Tony was unhappy. Hugo was unhappy. I was unhappy. I was just trying to update the existing book, SQL Server Execution Plans. It wasn’t working.

We all came to the conclusion that the old book was wrong. Not simply in a technical sense, although there was a lot of that, but in a structural sense. So we started rearranging things. SQL Server 2014 came out, but I was ready for it, having been on a preview, so it was no big deal. We started a second draft.

The book Grant mentions is free in PDF format thanks to Red Gate, so go give it a download and enjoy the fruits of Grant’s labor.

Comments closed

Literate Programming And Notebooks

Published 2018-09-13 by Kevin Feasel

David Smith sums up a debate on notebooks versus literate programming:

There’s no video yet available of Joel’s talk, but you can guess the theme of that opening slide, and walking through the slides conveys the message well, I think. Yuhui Xie, author and creator of the rmarkdown package, provides a detailed summary and response to Joel’s talk, where he lists Joel’s main critiques of Notebooks:

Hidden state and out-of-order execution
Notebooks are difficult for beginners
Notebooks encourage bad habits
Notebooks discourage modularity and testing
Jupyter’s autocomplete, linting, and way of looking up the help are awkward
Notebooks encourage bad processes
Notebooks hinder reproducible + extensible science
Notebooks make it hard to copy and paste into Slack/Github issues
Errors will always halt execution
Notebooks make it easy to teach poorly
Notebooks make it hard to teach well

Read the whole thing. I agree with some of these points, but disagree with a few on the list.

Comments closed

Mutating Data Frames Without dplyr

Published 2018-09-13 by Kevin Feasel

John Mount points out that there is a built-in function to mutate data frames in R:

The notation we used above is the “explicit argument” variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().

To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.
detach("package:dplyr", unload = TRUE)
Now let’s write the equivalent pipeline using exclusively base-R.

Click through for the way to do this as a pipeline operation.

Comments closed

When Secondary Nodes Affect Primaries

Published 2018-09-13 by Kevin Feasel

Dmitri Korotkevitch shows that a readable secondary in an Availability Group can negatively impact the primary:

A long time ago in a galaxy far, far away, I had to troubleshoot interesting performance issue in SQL Server. Suddenly, the CPU load on the server started to climb up. Nothing changed in terms of workload. The system still processed the same amount of requests. The execution plans of the critical queries stayed the same. Nevertheless, the CPU usage grew up slowly and steadily by a few percent per hour.

Eventually, we nailed it down. The problem occured in very busy OLTP system with very volatile data. We noticed that system performed much more I/O (logical and physical) than before. It was very strange, because nothing should have changed that day. Finally, we found that we have large number of deleted rows in the database that had not been cleaned up by ghost cleanup task.

Read on to learn what caused this mess.

Comments closed

Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Published 2018-09-13 by Kevin Feasel

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example. This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes. At the very least, it’s liable to cause confusion.

Comments closed

Trigger Spirals

Published 2018-09-13 by Kevin Feasel

David Fowler tells a story of woe, one which is totally not his fault:

To do this, a trigger was created which would send all the details via a Service Broker message to another SQL Server, this SQL Server was used to hold details of the AD accounts and from there, changes were automatically propagated out to AD.

This was working well until one day when it was realised that any changes to account permissions in AD weren’t reflected in the personnel database. To solve this, another trigger was created to send a Service Broker message back to the personnel database with details of the change.

This was where I came in, it was noticed that the system had started to run slower and slower, not only that but permissions seemed to be constantly changing for no obvious reason. Were the machines finally waking up and taking over?

There’s a reasonable explanation here, for some definition of reasonable.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30