2020-04-01 – Curated SQL

It’s always CRITICAL to pack a Container image with the minimal binaries required as this makes the surface area of attack minimal, upgrading the image and testing also becomes easier as there are less variables to be addressed. Distroless Docker images can be used for the same. In the above diagram Container (A) has only the application and the dependent binaries and nothing more. So, if there are no debugging tools in the Container (A) nor any way to check the status of the process then how do we debug any problem in the application? Once a pod is created, it’s even not possible to add Containers to it for additional debugging tools.
That’s where the Ephemeral Containers come into picture as in the Container (B) in the above picture. These Containers are temporary that can be included in the Pod dynamically with additional debugging tools. Once a Ephemeral Container has been created, we can connect to it as usual using the kubectl attach, kubectl exec and kubectl logs commands.

It’s an interesting approach to the problem.

Comments closed

The Flink-Hive Integration

Published 2020-04-01 by Kevin Feasel

Bowen Li takes us through Apache Flink 1.10’s integration with Apache Hive:

On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.
Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in Flink 1.10 and we can’t wait to walk you through the details.

Click through to see how it works.

Comments closed

Diving into Kubernetes: a Workshop

Published 2020-04-01 by Kevin Feasel

Chris Adkin has been busy:

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good reason for this, and that is because I have been working on a one day workshop to be delivered at SQL Bits in September, the material can be found here, enjoy !

I’ve only looked at the module listings, but Chris does a great job putting long-form articles together, so I’ve already added it to my todos.

Comments closed

The Peril of Local Variables

Published 2020-04-01 by Kevin Feasel

Erik Darling dives into the tradeoffs you make when using local variables in stored procedures to avoid parameter sniffing:

In a stored procedure (and even in ad hoc queries or within dynamic SQL, like in the examples linked above), if you declare a variable within that code block and use it as a predicate later, you will get either a fixed guess for cardinality, or a less-confidence-inspiring guess than when the histogram is used.
The local variable effect discussed in the rest of this post produces the same behavior as the OPTIMIZE FOR UNKNOWN hint, or executing queries with sp_prepare. I have that emphasized here because I don’t want to keep qualifying it throughout the post.

This deserves a careful read-through.

Comments closed

Tracking Object Changes and Views

Published 2020-04-01 by Kevin Feasel

Ed Pollack has a solution for tracking when the underlying objects which make up a view change:

When a view’s underlying objects change, the view itself will not change. This can result in a view where the data types of columns, as well as nullability, precision, and scale can be reported inaccurately. When this happens, it is possible for queries against these columns to return errors, truncate data, perform poorly, or otherwise behave in unexpected ways.
This article will delve into views, how they are defined, and how T-SQL can be used to programmatically test the validity of views and ensure they never become stale.

Click through for an interesting article with plenty of code demos.

Comments closed

Slicing Data by a Character in Power BI

Published 2020-04-01 by Kevin Feasel

Reza Rad comes up with an interesting hack for Power BI:

I have a table for all customers, and I am showing them all in a table visual in Power BI. However, there are many customers in the list, let’s say 18K+. If I want to search for all customers who have “q” in the name, then I need to either scan the table myself, Or use a slicer with a search box, and search for character “q”, and then select all the names with “q” one by one! something like below is tedious!

Click through for more details and the opportunity to download a sample file.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Day: April 1, 2020

Using Ephemeral Containers for Debugging Kubernetes-Based Applications

The Flink-Hive Integration

Diving into Kubernetes: a Workshop

The Peril of Local Variables

Tracking Object Changes and Views

Slicing Data by a Character in Power BI