Press "Enter" to skip to content

Author: Kevin Feasel

A SQL Client For Apache Flink

Alex Woodie points out that Apache Flink now has a SQL client built in:

Apache Flink has contained SQL functionality since Flink version 1.1, which introduced a SQL API based on Apache Calcite and a table API, too. While the combined SQL and Table API today provides valuable ways for developers to apply well-understood relational data and SQL constructs to the world of stream data processing, its usefulness is somewhat limited.

For starters, only Scala and Java experts can avail themselves of API, according to the description of the new SQL client, which is codenamed FLIP-24. What’s more, any table program that was written with the SQL and Table API had to be packaged with Apache Maven, a Java-based project management tool, and submitted to the Flink cluster before running.

With the launch of the SQL CLI Client in Flink version 1.5, the Flink community is taking its support for SQL in a new direction. According to the FLIP-24 project page, providing an interactive shell will not only make Flink accessible to non-programmers, including data scientists, but it will also eliminate the need for a full IDE to program Flink apps. With millions of SQL-loving data analysts out there, the benefits could certainly be vast.

Good stuff.  Feasel’s Law in action.

Comments closed

Using Temporal Tables For Auditing Changes

Nisarg Upadhyay shows how to use temporal tables for basic data auditing:

To audit the data changes for a specific table, we should perform the time-based analysis of temporal tables. To do that, we must use the ‘FOR SYSTEM_TIME’ clause with below temporal-specific sub-clauses to the query data across the current and history tables. Let me explain the output of queries using different sub-clauses. Below is the setup:

  1. I inserted a product named ‘Flat Washer 8’ with List price 0.00 in the temporal table at 09:02:25 AM.

  2. I changed the List Price at 10:13:56 AM. New Price is 500.00.

Temporal tables aren’t going to solve all of your auditing problems but it can be useful.

Comments closed

Does Your Server Have Too Much Memory?

Brent Ozar has a few tips to see if you have too much memory:

Sounds impossible, right? The saying goes that you can never be too rich or too thin or have too much memory.

However, there is one good indication that your SQL Server is probably overprovisioned, and to explain it, I need to cover 3 metrics.

1. Max Server Memory is set at the instance level: right-click on your SQL Server name in SSMS, click Properties, Memory, and it’s “Maximum server memory.” This is how much memory you’re willing to let the engine use. (The rocket surgeons in the audience are desperate for the chance to raise their hands to point out different things that are or aren’t included in max memory – hold that thought. That’s a different blog post.)

If you have too much memory, I’d happily borrow a cup of RAM.

Comments closed

Why Nobody Is Reading Your Report

Stephanie Evergreen really cuts to the chase:

Here’s the hard truth: Your report probably sucks. Mine sure did. The heart of your content is likely fine, maybe even helpful. But, if you are anything like the hundreds of reports I see every year, the entire set of cultural norms we have somehow developed around reporting is just setting us up for failure, writing a destiny where no one is reading the report.

Why? Let me lay out the most common issues I see and propose some strategic solutions.

There’s an emphasis here on academic papers but it also applies to corporate work too.

Comments closed

Trigger Or Constraint?

Andy Levy points out that you shouldn’t use a trigger when a default constraint will do:

We want to spend our SQL Server licensing dollars wisely, so why ask it to do unnecessary work? More CPU time, more IO, and our client is waiting longer for the process to complete (I don’t like making anyone wait).

There’s a second “gotcha” with the AFTER INSERT method that applies to only some use cases. Let’s say you’re loading some historical data into the table, and that data includes the LastModified date. Or maybe your application has a very specific date that should be inserted into the table for this field.

Andy makes good points.

Comments closed

Alternatives To Temp Tables In SSIS

Tim Mitchell gives us a few methods for avoiding temp tables in SQL Server Integration Services:

While temp tables are a good option for in-flight data transformation, there are some unique challenges that arise when using temp tables in SSIS.

SQL Server Integration Services uses tight metadata binding for data flow operations. This means that when you connect to a relational database, flat file, or other structure in an SSIS data flow, the SSIS design-time and runtime tools will check those data connections to validate that they exist and that the metadata has not changed. This tight binding is by design, to avoid potential runtime issues arising from unexpected changes to the source or destination metadata.

Because of this metadata validation process, temp tables present a challenge to the SSIS data flow. Since temp tables exist only for the duration of the session(s) using them, it is likely that one of these tables created in a previous step in an SSIS package may not be present when validation needs to occur. During the design of the package (or even worse, when you execute the deployed package in a scheduled process), you could find yourself staring at an “object not found” error message.

It’s good to have alternatives, though there are times when you really just need a temp table.

Comments closed

Taking Screenshots With R

Abdul Majed Raja shows us how to take screenshots of webpages using R:

webshot package provides one simple function webshot() that takes a webpage url as its first argument and saves it in the given file name that is its second argument. It is important to note that the filename includes the file extensions like ‘.jpg’, ‘.png’, ‘.pdf’ based on which the output file is rendered. Below is the basic structure of how the function goes:

library(webshot)

#webshot(url, filename.extension)
webshot(“https://www.listendata.com/”, “listendata.png”)

If no folder path is specified along with the filename, the file is downloaded in the current working directory which can be checked with getwd().

Now that we understood the basics of the webshot() function, It is time for us to begin with our cases – starting with downloading/converting a webpage as a PDFcopy.

This isn’t something I’d expect to do every day, but I could see it being useful as part of a notebook to give the user a sanity check, like if a webpage or data set has a last updated timestamp that you want to check.  H/T R-Bloggers

Comments closed

Native Scoring With SQL Server 2017 R Services

Tomaz Kastrun gives us an example using native scoring in SQL Server 2017 Machine Learning Services:

Native scoring in SQL Server 2017 comes with couple of limitations, but also with a lot of benefits. Limitations are:

  • currently supports only SQL server 2017 and Windows platform

  • trained model should not exceed 100 MiB in size

  • Native scoring with PREDICT function supports only following algorithms from RevoScaleR library:

    • rxLinMod (linear model as linear regression)

    • rxLogit (logistic regression)

    • rxBTrees (Parallel external memory algorithm for Stochastic Gradient Boosted Decision Trees)

    • rxDtree (External memory algorithm for Classification and Regression Trees

    • rxDForest (External memory algorithm for Classification and Regression Decision Trees)

Read on for an example.  If you’re using one of these methods, then native scoring is extremely fast and a bit more flexible than I originally anticipated.  The problem is that you have to use one of those methods.

Comments closed

WVPlots 1.0.0

John Mount announces WVPlots 1.0.0:

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN!

The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions. This is a package to produce plots while you are in the middle of another task.

I like this idea:  I know the kind of plot I need and just want to throw something together for myself to give me an idea of the underlying data.

Comments closed