Press "Enter" to skip to content

Curated SQL Posts

Using Ephemeral Containers for Debugging Kubernetes-Based Applications

Praveen Sripati walks us through the notion of Ephemeral Containers:

It’s always CRITICAL to pack a Container image with the minimal binaries required as this makes the surface area of attack minimal, upgrading the image and testing also becomes easier as there are less variables to be addressed. Distroless Docker images can be used for the same. In the above diagram Container (A) has only the application and the dependent binaries and nothing more. So, if there are no debugging tools in the Container (A) nor any way to check the status of the process then how do we debug any problem in the application? Once a pod is created, it’s even not possible to add Containers to it for additional debugging tools.
That’s where the Ephemeral Containers come into picture as in the Container (B) in the above picture. These Containers are temporary that can be included in the Pod dynamically with additional debugging tools. Once a Ephemeral Container has been created, we can connect to it as usual using the kubectl attach, kubectl exec and kubectl logs commands.

It’s an interesting approach to the problem.

Comments closed

The Flink-Hive Integration

Bowen Li takes us through Apache Flink 1.10’s integration with Apache Hive:

On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.

Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in Flink 1.10 and we can’t wait to walk you through the details.

Click through to see how it works.

Comments closed

Diving into Kubernetes: a Workshop

Chris Adkin has been busy:

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good reason for this, and that is because I have been working on a one day workshop to be delivered at SQL Bits in September, the material can be found here, enjoy !

I’ve only looked at the module listings, but Chris does a great job putting long-form articles together, so I’ve already added it to my todos.

Comments closed

The Peril of Local Variables

Erik Darling dives into the tradeoffs you make when using local variables in stored procedures to avoid parameter sniffing:

In a stored procedure (and even in ad hoc queries or within dynamic SQL, like in the examples linked above), if you declare a variable within that code block and use it as a predicate later, you will get either a fixed guess for cardinality, or a less-confidence-inspiring guess than when the histogram is used.

The local variable effect discussed in the rest of this post produces the same behavior as the OPTIMIZE FOR UNKNOWN hint, or executing queries with sp_prepare. I have that emphasized here because I don’t want to keep qualifying it throughout the post.

This deserves a careful read-through.

Comments closed

Tracking Object Changes and Views

Ed Pollack has a solution for tracking when the underlying objects which make up a view change:

When a view’s underlying objects change, the view itself will not change. This can result in a view where the data types of columns, as well as nullability, precision, and scale can be reported inaccurately. When this happens, it is possible for queries against these columns to return errors, truncate data, perform poorly, or otherwise behave in unexpected ways.

This article will delve into views, how they are defined, and how T-SQL can be used to programmatically test the validity of views and ensure they never become stale.

Click through for an interesting article with plenty of code demos.

Comments closed

Slicing Data by a Character in Power BI

Reza Rad comes up with an interesting hack for Power BI:

I have a table for all customers, and I am showing them all in a table visual in Power BI. However, there are many customers in the list, let’s say 18K+. If I want to search for all customers who have “q” in the name, then I need to either scan the table myself, Or use a slicer with a search box, and search for character “q”, and then select all the names with “q” one by one! something like below is tedious!

Click through for more details and the opportunity to download a sample file.

Comments closed

Data Exfiltration Protection when Using Azure Databricks

Bhavin Kukadia, et al, explain how to prevent users from taking data from your Databricks cluster without authorization:

Solving for data exfiltration can become an unmanageable problem if the PaaS service requires you to store your data with them or it processes the data in the service provider’s network. But with Azure Databricks, our customers get to keep all data in their Azure subscription and process it in their own managed private virtual network(s), all while preserving the PaaS nature of the fastest growing Data & AI service on Azure. We’ve come up with a secure deployment architecture for the platform while working with some of our most security-conscious customers, and it’s time that we share it out broadly.

Click through for the architectural pattern.

Comments closed

Deconstructing Running Totals with M

Cedric Charlier shows how to turn a running total into a periodic series of events with M:

When dealing with time series, you should always check if your time series are event-based or cumulative. For a time series with a step of one day, an event-based time series will contain the count of events for a given day. A cumulative time series will contain the sum of the event of that day and of all the previous days! In this blog post I’ll explain how to transform a cumulative time series into an event-based.

Click through for the code. You can do this in T-SQL as well by subtracting the value from its LAG()-ged value.

Comments closed

A New Powershell Module for SQL Server Security

Stuart Moore introduces dbaSecurityScan:

How easy it to audit them? If someone asks you the DBA exactly who has access to object A, can you tell them? How do people get access to that object, is it via a role, a schema or an explicit permission?

Is that information in an easy to read or manipulate manner?

How do you ensure that permissions persist between upgrades? I’ve certainly seen 3rd party upgrades that have reset database level permissions. Do you have a mechanism to check every permission and put them back as they were?

We’re all doing the devops these days. Our database schema is source controlled, and we’re deploying it incrementally in pipelines and testing it. But are we doing that with our database security?

So in the classic open source way, I decided to scratch my own itch by writing something. That something is dbaSecurityScan, a PowerShell module that aims to offer a solution for all of the above.

Click through to see what dbaSecurityScan covers today, how to call it, and what you can do to get more info.

Comments closed