A SQL Client For Apache Flink

Alex Woodie points out that Apache Flink now has a SQL client built in:

Apache Flink has contained SQL functionality since Flink version 1.1, which introduced a SQL API based on Apache Calcite and a table API, too. While the combined SQL and Table API today provides valuable ways for developers to apply well-understood relational data and SQL constructs to the world of stream data processing, its usefulness is somewhat limited.

For starters, only Scala and Java experts can avail themselves of API, according to the description of the new SQL client, which is codenamed FLIP-24. What’s more, any table program that was written with the SQL and Table API had to be packaged with Apache Maven, a Java-based project management tool, and submitted to the Flink cluster before running.

With the launch of the SQL CLI Client in Flink version 1.5, the Flink community is taking its support for SQL in a new direction. According to the FLIP-24 project page, providing an interactive shell will not only make Flink accessible to non-programmers, including data scientists, but it will also eliminate the need for a full IDE to program Flink apps. With millions of SQL-loving data analysts out there, the benefits could certainly be vast.

Good stuff.  Feasel’s Law in action.

Grouping By Nothing In SQL

Lukas Eder points out a subtlety of the GROUP BY clause:

SELECT count(*)
FROM film
GROUP BY ()

This will yield:

count |
------|
1000 |

What’s the point, you’re asking? Can’t we just omit the GROUP BY clause? Of course, this will yield the same result:

SELECT count(*)
FROM film

Yet, the two versions of the query are subtly different.

Great post and also shows a case when GROUP BY () isn’t supported.

Using Temporal Tables For Auditing Changes

Nisarg Upadhyay shows how to use temporal tables for basic data auditing:

To audit the data changes for a specific table, we should perform the time-based analysis of temporal tables. To do that, we must use the ‘FOR SYSTEM_TIME’ clause with below temporal-specific sub-clauses to the query data across the current and history tables. Let me explain the output of queries using different sub-clauses. Below is the setup:

  1. I inserted a product named ‘Flat Washer 8’ with List price 0.00 in the temporal table at 09:02:25 AM.

  2. I changed the List Price at 10:13:56 AM. New Price is 500.00.

Temporal tables aren’t going to solve all of your auditing problems but it can be useful.

Does Your Server Have Too Much Memory?

Brent Ozar has a few tips to see if you have too much memory:

Sounds impossible, right? The saying goes that you can never be too rich or too thin or have too much memory.

However, there is one good indication that your SQL Server is probably overprovisioned, and to explain it, I need to cover 3 metrics.

1. Max Server Memory is set at the instance level: right-click on your SQL Server name in SSMS, click Properties, Memory, and it’s “Maximum server memory.” This is how much memory you’re willing to let the engine use. (The rocket surgeons in the audience are desperate for the chance to raise their hands to point out different things that are or aren’t included in max memory – hold that thought. That’s a different blog post.)

If you have too much memory, I’d happily borrow a cup of RAM.

Why Nobody Is Reading Your Report

Stephanie Evergreen really cuts to the chase:

Here’s the hard truth: Your report probably sucks. Mine sure did. The heart of your content is likely fine, maybe even helpful. But, if you are anything like the hundreds of reports I see every year, the entire set of cultural norms we have somehow developed around reporting is just setting us up for failure, writing a destiny where no one is reading the report.

Why? Let me lay out the most common issues I see and propose some strategic solutions.

There’s an emphasis here on academic papers but it also applies to corporate work too.

Trigger Or Constraint?

Andy Levy points out that you shouldn’t use a trigger when a default constraint will do:

We want to spend our SQL Server licensing dollars wisely, so why ask it to do unnecessary work? More CPU time, more IO, and our client is waiting longer for the process to complete (I don’t like making anyone wait).

There’s a second “gotcha” with the AFTER INSERT method that applies to only some use cases. Let’s say you’re loading some historical data into the table, and that data includes the LastModified date. Or maybe your application has a very specific date that should be inserted into the table for this field.

Andy makes good points.

Alternatives To Temp Tables In SSIS

Tim Mitchell gives us a few methods for avoiding temp tables in SQL Server Integration Services:

While temp tables are a good option for in-flight data transformation, there are some unique challenges that arise when using temp tables in SSIS.

SQL Server Integration Services uses tight metadata binding for data flow operations. This means that when you connect to a relational database, flat file, or other structure in an SSIS data flow, the SSIS design-time and runtime tools will check those data connections to validate that they exist and that the metadata has not changed. This tight binding is by design, to avoid potential runtime issues arising from unexpected changes to the source or destination metadata.

Because of this metadata validation process, temp tables present a challenge to the SSIS data flow. Since temp tables exist only for the duration of the session(s) using them, it is likely that one of these tables created in a previous step in an SSIS package may not be present when validation needs to occur. During the design of the package (or even worse, when you execute the deployed package in a scheduled process), you could find yourself staring at an “object not found” error message.

It’s good to have alternatives, though there are times when you really just need a temp table.

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031