Press "Enter" to skip to content

Author: Kevin Feasel

Temporal Tables with Flink

Marta Paes shows off a new feature in Apache Flink:

In the 1.7 release, Flink has introduced the concept of temporal tables into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and keep data closely tied to time context, so that it can be interpreted as valid only within a specific period of time. Transforming a stream into a temporal table requires:

– Defining a primary key and a versioning field that can be used to keep track of the changes that happen over time;
– Exposing the stream as a temporal table function that maps each point in time to a static relation.

It looks pretty good.

Comments closed

Converting Existing SSIS Packages to Biml

David Stein shows off a conversion tool built into BimlExpress:

BimlExpress is a free Visual Studio add-in created by the good folks at Varigence. Its a full featured Biml editor which allows you to dynamically create SSIS packages. It was first released back in 2017, and the latest version is 2019 (of course). The current version supports Visual Studio 2010 through 2019 as well as SQL Server 2005 through 2019.

Prior to it’s release, Biml was written with Bids Helper, now known as BI Developer Extensions. While BI Developer Extensions has many nice features, you should no longer use it to work with Biml as it is no longer being updated/supported.

I’m pleasantly surprised by this. It used to be limited to BimlStudio (nee Mist) and BimlOnline.

Comments closed

PolyBase — SQL to SQL

I have a post covering PolyBase from SQL Server to SQL Server:

Historically, PolyBase has three separate external entities: external data sources, external file formats, and external tables. External data sources tell SQL Server where the remote data is stored. External file formats tell SQL Server what the shape of that data looks like—in other words, CSV, tab-separated, Parquet, ORC, etc. External tables tell SQL Server the structure of some data of a particular external file format at a particular external data source.

With PolyBase V2—connectivity with SQL Server, Cosmos DB, Oracle, Spark, Hive, and a boatload of other external data sources—we no longer need external file formats because we ingest structured data. Therefore, we only need an external data source and an external table. You will need SQL Server 2019 to play along and I’d recommend keeping up on CTPs—PolyBase is under active development so being a CTP behind may mean hitting bugs which have subsequently been fixed.

I want this to get even better, to the point where external tables are a no-brainer over linked servers in terms of performance.

Comments closed

SQL Server and Terraform

Andrew Pruski continues a series on using Terraform to deploy to Azure Container Instances:

In a previous post I went through how to deploy SQL Server running in an Azure Container Instance using Terraform.

In that post, I used hardcoded variables in the various .tf files. This isn’t great to be honest as in order to change those values, we’d need to update each .tf file. It would be better to replace the hardcoded values with variables whose values can be set in one separate file.

So let’s update each value with a variable in the .tf files.

Click through for a demo.

Comments closed

Considerations when Deleting Lots of Data

Ed Elliott takes us through things to think about before deleting a few million rows from a table:

Fragmentation
Fragmentation occurs when we delete from pages, and there is still data surrounding our data. If we have 100 rows and delete every odd row, we would have perfect fragmentation in that we have doubled the size of the data that we need. If we delete rows 1-49, even though we remove the same number of rows we don’t have any fragmentation as the data is in a continuous block. Knowing how the data is stored on disk and how the data will be deleted, is it the first x records or every x record is vital so that we know whether, after the delete, we should also reorganise the indexes to remove the deleted records.

Ed has quality insights here, so check it out.

Comments closed

Power BI Connection String Capitalization

Slava Murygin ran into a problem with capitalization on connection strings with Power BI Report Server:

The message actually says:
“Several errors occurred during data refresh. Please try again later or contact your administrator.”
SessionID: 1b80301e-3898-417a-af9c-2e77ec490728
[0] -1055784932: Credentials are required to connect to the SQL source. (Source at SQLServerName;DBA_Pro.). The exception was raised by the IDbCommand interface.
[1] -1055784932: The command has been canceled.. The exception was raised by the IDbCommand interface.
[2] -1055784932: The command has been canceled.. The exception was raised by the IDbCommand interface.

In my case the cause of the problem was very silly thing. PowerBI Server assigned only one data source connection string to my report, while in my report I had two data sources with only the difference in a Database Name capitalization:

This was a weird scenario.

Comments closed

Finding Queries to Cache In-App

Brent Ozar provides guidance on the types of queries you might want to cache in your application:

Question 2: Will out-of-date data really hurt? Some data absolutely, positively has to be up to the millisecond, but you’d be surprised how often I see data frequently queried out of the database when the freshness doesn’t really matter. For example, one client discovered they were fetching live inventory data on every product page load, but the inventory number didn’t even matter at that time. During sales, the only inventory count that mattered was when the customer completed checkout – inventory was checked then, and orders only allowed if products were still in inventory then.

There are some good questions in here which can help figure out what can fit in a cache and what really needs to be fresh.

Comments closed

Decluttering Visuals

Elizabeth Ricks shows how you can remove extraneous noise from a visual and make it easier for a person to understand what you’re trying to show:

This caught my eye not because of the topic but because of how much time it took me to figure out what information it was trying to convey. What should I do with this? There’s a lot competing for my attention in this chart and distracting me from the data.

Spend a moment examining this graph and take note of which specific elements are challenging. Make a list: what might we eliminate or change to reduce cognitive burden?

I came up with eight specific design changes I would make. How does my list compare with yours?

There is a subjective element to this, but the end result does look better.

Comments closed

Defining TF-IDF

Bruno Stecanella explains the concept behind TF-IDF:

TF-IDF was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular.

However, if the word Bug appears many times in a document, while not appearing many times in others, it probably means that it’s very relevant. For example, if what we’re doing is trying to find out which topics some NPS responses belong to, the word Bug would probably end up being tied to the topic Reliability, since most responses containing that word would be about that topic.

This makes the technique useful for natural language processing, especially in classification problems.

Comments closed

Puzzling Through Older Problems

Kenneth Fisher shares a couple of interview puzzles:

The year is 2004. You’re taking a tech test as an interview for a SQL development job. They have a page in their application that displays up to 20 rows of information. They need a piece of code that will return the rows from a given page. Oh, and it may not always be 20 rows per page. You need to write a piece of code where they can pass in a page number and page size and get back results. So for example, if the page size is 20 and the page is 3 then you need to return back rows 41 to 60.

The answers aren’t on the page, but then again, that’s the point of a puzzle.

Comments closed