2019-04-18 – Curated SQL

In the previous blog we explored on setting up an K8S Cluster on the AWS Cloud without using any additional softwares or tools. The Cloud Providers make it easy to create a K8S Cluster in the Cloud. The tough part is securing, fine tuning, upgradation, access management etc. Rancher provides a centralized management of the different K8S Clusters, these can be in any of the Cloud (AWS, GCP, Azure) or In-Prem. More on what Rancher has to offer on top of K8S here. The good thing about Rancher is that’s it’s 100% Open Source and there is no Vendor Lock-in. We can simply remove Rancher and interact with the K8S Cluster directly.

I like this kind of tooling because it reduces cloud lock-in. For something like Kubernetes, where the whole point is orchestration of ephemeral containers, there’s a lot of benefit in being able to shift between services as needed.

Comments closed

Contrasting Flink with Kafka Streams

Published 2019-04-18 by Kevin Feasel

Sourabh Verma contrasts Apache Flink with Kafka Streams:

Initially, I would like you all to focus on a few questions before comparing the frameworks:
1. Is there any comparison or similarity between Flink and the Kafka?
2. What could be better in Flink over the Kafka?
3. Is it the problem or system requirement to use one over the other?

I’m generally happy with both technologies as well as Spark Streaming. But as Sourabh points out, there are differences to keep in mind.

Comments closed

Time Travel in Snowflake

Published 2019-04-18 by Kevin Feasel

Koen Verbeeck shows an interesting feature in Snowflake:

Time travel in Snowflake is similar to temporal tables in SQL Server: it allows you to query the history rows of a table. If you delete or update some rows, you can retrieve the status of the table at the point in time before you executed that statement. The biggest difference is that time travel is applied by default on all tables in Snowflake, while in SQL Server you have to enable it for each table specifically. Another difference is Snowflake only keeps history for 1 day, configurable up to 90 days. In SQL Server, history is kept forever unless you specify a retention policy.
How does time travel work? Snowflake is built for the cloud and its storage is designed for working with immutable blobs. You can imagine that for every statement you execute on a table, a copy of the file is made. This means you have multiple copies of your table, for different points in time. Retrieving time travel data is then quite easy: the system has only to search for the specific file that was valid for that point in time. Let’s take a look at how it works.

It looks interesting, though the “Snowflake doesn’t have backups like you know them in SQL Server” gives pause.

Comments closed

Emailing Data From Power BI Via PowerApps and Flow

Published 2019-04-18 by Kevin Feasel

Erik Svensen investigates how to email specific records of data from Power BI:

One of my clients called me the other day and asked whether it was possible to export the selected order that was selected in the current report page – as she wanted to send the information to another user. I explained the export data feature from the visual action menu but she didn’t want to download a file and then locate that and then switch to Outlook and click new mail – type the correct the e-mail and attach the file – that was not very Power like – to much clicky clicky – because all the data was actually available when she had filtered the report for that particular record – the e-mail she wanted to mail the data to and off course the data she saw on the screen.
Hmm… Let’s see how we can use the PowerPlatform stack to solve this requirement.

Erik got everything working, so check it out.

Comments closed

When Window Functions are Too Slow

Published 2019-04-18 by Kevin Feasel

Bert Wagner shows a scenario where a window function ends up performing poorly:

If you’ve used FIRST_VALUE before, this query should be easy to interpret: for each badge Name, return the first UserId sorted by Date (earliest date to receive the badge) and UserId (pick the lowest UserId when there are ties on Date).
This query was easy to write and is simple to understand. However, the performance is not great: it takes 46 seconds to finish returning results on my machine.

Bert’s response is to rewrite the query using a correlated subquery. My first shot would look at using APPLY though needing to aggregate the “parent” could lead to an awful result there if the join happened before aggregation.

The moral of the story here is to know different ways to write a query, as you can nudge the optimizer to better (or worse) behavior.

Comments closed

Explaining Implicit Conversion

Published 2019-04-18 by Kevin Feasel

Monica Rathbun explains to us what implicit conversion is and when it can go wrong:

Another quick post of simple changes you can make to your code to create more optimal execution plans. This one is on implicit conversions. An implicit conversion is when SQL Server must automatically convert a data type from one type to another when comparing values, moving data or combining values with other values. When these values are converted, during the query process, it adds additional overhead and impacts performance.

Read on for more info, including a common scenario where implicit conversion causes performance degradation.

Comments closed

Custom kubectl Plugin: Connect to SQL Server

Published 2019-04-18 by Kevin Feasel

Andrew Pruski shows how to create custom kubectl plugins:

When I deploy SQL Server to Kubernetes I usually create a load balanced service so that I can get an external IP to connect from my local machine to SQL running in the cluster. So how about creating a plugin that will grab that external IP and drop it into mssql-cli?
Let’s have a go at creating that now.

Click through for two demos including the appropriately-named kubectl prusk.

Comments closed

Conflict Tracking in Merge Replication

Published 2019-04-18 by Kevin Feasel

Ranga Babu shows the two different models for conflict detection with merge replication:

Conflict Detection:
The conflict detection depends on the type of tracking we configure for the article.
– Row-level tracking: If data changes are made to any column on the same row at both ends, then it is considered a conflict.
–Column-level tracking: If data changes are made on the same column at both ends, this change is qualified as a conflict.

Read on for a detailed demonstration of the two.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Day: April 18, 2019

Centralized Kubernetes Management with Rancher

Contrasting Flink with Kafka Streams

Time Travel in Snowflake

Emailing Data From Power BI Via PowerApps and Flow

When Window Functions are Too Slow

Explaining Implicit Conversion

Custom kubectl Plugin: Connect to SQL Server

Conflict Tracking in Merge Replication