Month: April 2019

Adding SSIS Catalog to a Docker Container

Published 2019-04-02 by Kevin Feasel

Andy Leonard takes two shots at adding the SSIS Catalog to a Docker container. First up is the version which doesn’t work:

I have been working on getting an SSIS Catalog running in a container for a couple years.
I share this post not to discourage you.
I share it to let you know one way I failed.
I thought I had succeeded when the PowerShell in this post worked. The PowerShell works, by the way – just not in a container configured thus.
This is but one failure.
I failed more than once, I promise.

Andy perseveres and succeeds in part 4 of the series:

I can hear some of you thinking, “How do we accomplish this, Andy?”
I’m glad you asked.
The answer is “We modify our container.”
Disclaimer: I’m about 100% certain there’s another way to do this and about 99% sure there’s a better way. I’m going to show you what I did. Cool?
Cool.

Read on to see how Andy did it.

Comments closed

Capturing Execution Plans with Extended Events

Published 2019-04-02 by Kevin Feasel

Pedro Lopes shows us how we can get actual execution plans using Extended Events:

Query execution plans, otherwise known as actual execution plans or just Showplan, provide a map of all the required operations to get the query output, which includes runtime performance statistics.
Query Store (QS) is a valuable tool for troubleshooting workload trends via specific queries – this is because QS aggregates performance data on queries. However, sometimes we need to get the specific, singleton query execution plan to analyze and troubleshoot – this is where xEvents come in.
Since SQL Server 2012 we’ve had the query_post_execution_showplan xEvent for this. As the name suggests, it gets you the actual query plan – because it is *after* execution – when we have the runtime statistics available. However this xEvent is based on the standard query execution statistics profile infrastructure(quite a mouthful) – or standard profiling for short (read more about it here). This has a very high overhead (75%+ with a TPC-C like workload) which is why its use needs to be seriously considered, and most likely not used unless in last resort.
In more recent releases we have other alternatives for these requirements to get the singleton actual execution plans, based on the lightweight query execution statistics profile infrastructure – or lightweight profiling for short (read more about it here). These xEvents are listed below, where we’ll see examples on how to use them.

Pedro does a good job of taking us through the available events and what we get from them.

Comments closed

Shrinking Dot Sizes in Power BI

Published 2019-04-02 by Kevin Feasel

David Eldersveld shows how we can reduce the point size of dots in POwer BI as of March 2019:

One of the Power BI improvements in the March 2019 Desktop release was reduced bubble size for the Map visual. I previously wrote about the benefit of the reduction in point/bubble size. I was unaware until recently that this change made it into more than the Map visual.
The ability to reduce the point size also appears in the Format options for the Power BI Scatter chart. Previously, you could change the size option from 0 to 100 under the Shapes area. As with the Map, the Scatter now allows you to reduce the size as low as -30. I did not see this mentioned in the March Desktop blog post. I must have missed it if it was part of a previous month’s release. In any case, if you were not aware that you could set the point size from -30 to 100with the Scatter chart, now you do.

For most scenarios, I think the dot size is probably a little too big. -30 is generally too small, but I’m happy that they offer us options to get it right.

Comments closed

Important Considerations with Indexed Views

Published 2019-04-02 by Kevin Feasel

Jason Brimhall takes us through a few important considerations with indexed views:

The creation of this view has chewed up a bunch of storage. It has jumped right up to the number two spot on the biggest objects list within this database. You can see that differences by comparing the highlighted rows to the previous image. The vPerson view is highlighted in red in this second image to help point it out quickly.
Surely this must be a contrived example and people don’t really do this in the real world, right? The answer to that is simply: NO! It DOES happen. I see situations like this all too often. Far too often, large text fields are added to an indexed view to make retrieval faster. I have mimicked that by adding in two XML columns from the Person.Person table. This is definitely overkill because a simple join back to the table based on the BusinessEntityID would get me those two columns. All I have effectively done is duplicated data being stored and I have achieved that at the low low cost of increased storage of 25% for this small database. If you are curious, the column count between the Person.Person table and this new view is 13 columns each.

Jason takes us through a couple more gotchas and provides some important advice you should follow if you think indexed views might be a fit for you.

Comments closed

Finding Where Extended Events Configuration Data Is Stored

Published 2019-04-02 by Kevin Feasel

Dave Bland shows us where we can find configuration data for Extended Events:

In this DMV, obviously there are a number of columns. This post will focus mostly on the object_type column. This column has a number of potential values, including Event, Target and Action. When setting up a SQL Server Extended Event session these are a few of the key items that need to be filled out.
Let’s just take a few moments to identify what these are.

Click through to see where you can find targets, actions, and event descriptions.

Comments closed

Consuming Apache Kafka Messages in Browsers

Published 2019-04-01 by Kevin Feasel

Joseph Rea takes us through the Apache Kafka message browser:

A classic interview question is: “How do you go about displaying large amounts of data in a performant way?” Most people (at least on the front end), usually come up with pagination first. An implementation for pagination might go something like this:
Out of a list of 100, request 10 items at a time until 100 items are reached. So you would do 9 requests, asking for 1–10, 11–20, etc., until the 100 are reached.
In Kafka’s case, there could be 1 million messages between successive requests, so a user can never see the “latest” message, only the range as requested by the browser. In addition, there is a fundamental problem with pagination as it relates to Kafka. Message ordering across partitions is non-deterministic, so what is displayed in the UI, a linear sequence from 1–100, would not represent the data as it is laid out inside of Kafka.

Very interesting reading.

Comments closed

Big Data Often Isn’t

Published 2019-04-01 by Kevin Feasel

Arnon Rotem-gal-oz argues that “big data” is often a misnomer:

I couldn’t find numbers from Google but others say that by 2017 Google processed over 20PB a day (not to mention answering 40K search queries/second) so Google is definitely in the big data game. The numbers go down fast after that, even for companies who are really big data companies — Facebook presented back in 2017 that they handle 500TB+ of new data daily, the whole of Twitter’s data as of May 2018 was around 300PB, and Uber reported their data warehouse is in the 100+ PB range.
Ok, but what about the rest of us? Let’s take a look at an example.

I often fight with this myself—SQL Server can easily handle multi-billion row data sets, for example. It’s the same problem in Azure with SQL Data Warehouse: the “you must be this tall to ride the rides” marker is set pretty high.

Comments closed

Conditional Replacement in Power Query

Published 2019-04-01 by Kevin Feasel

Soheil Bakhshi shows us how to do conditional replacement based on the values of other columns using Power Query:

Power Query (M) made a lot of data transformation activities much easier and value replacement is one of them. You can easily right click on any desired value in Power Query, either in Excel or Power BI, or other components of Power Platform in general, and simply replace that value with any desired alternative. Replacing values based on certain conditions however, may not seem that easy at first. I’ve seen a lot of Power Query (M) developers adding new columns to accomplish that. But adding a new column is not always a good idea, especially when you can do it in a simple single step in Power Query. In this post I show you a quick and easy way to that can help you handling many different value replacement scenarios.
Imagine you have a table like below and you have a requirement to replace the values column [B] with the values of column [C] if the [A] = [B].

Click through for the solution.

Comments closed

Querying Apache Druid

Published 2019-04-01 by Kevin Feasel

Manish Mishra takes us through the basics of querying from Apache Druid:

I would not mind quoting the Druid documentation for this purpose: “Druid is a data store designed for high-performance slice-and-dice analytics (“OLAP“-style) on large data sets. Druid is most often used as a data store for powering GUI analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations.”
You might be wondering where is “SQL” in that? Actually, the fact is Druid is designed for special kind of SQL workloads which we can relate with powering the GUI analytical applications which require low latency query response. But in this post, we will only look in the “how part” of it using Druid to quickly run queries.

Click through to see how.

Comments closed

Building the Right Architecture for the Job

Published 2019-04-01 by Kevin Feasel

Gogula Aryalingam takes us through an example where the newest and most expensive tools aren’t the best for the job at hand:

When Azure SQL Data Warehouse was chosen to implement a multi-dimensional data warehouse, it may have seemed like the ideal choice. Why? because it was plain to see: keywords: “SQL”, “Warehouse”. However, no, SQL Data Warehouse is ideal only when you have data loads that are quite high, not when it is only several 100GBs. Armed with a few more reasons as to why not (A good reference for choosing Azure SQL Data Warehouse), I had confronted them. But the rebuke then was that they did get good enough performance, and that cost wasn’t a problem. Until of course a few months later when complex queries started hitting the system, and despite being able to afford that cost, the value of paying that amount did not seem worth it.

Having a good architectural understanding of the Azure or AWS platform—even if you aren’t deeply familiar with all of the tools—can help avoid these types of problems.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30