Press "Enter" to skip to content

Curated SQL Posts

Kafka Integration with Knime

Swantika Gupta shows off some of Knime’s ability to integrate with Apache Kafka:

Knime Analytics Platform provides it’s users a way to consume messages from Apache Kafka and publish the transformed results back to Kafka. This allows the users to integrate their knime workflows easily with a distributed streaming pub-sub mechanism.

With Knime 3.6 +, the users get a Kafka extension with three new nodes:
1. Kafka Connector
2. Kafka Consumer
3. Kafka Producer

Click through to see how to configure each and how to enrich your data with Knime.

Comments closed

HIVE-6384 Errors with Spark and Parquet

Manoj Pandey troubleshoots an issue:

But I was getting following error:

warning: there was one feature warning; re-run with -feature for details
java.lang.UnsupportedOperationException: Parquet does not support decimal. See HIVE-6384

 
As per the above error it relates to some Hive version conflict, so I tried checking the Hive version by running below command and found that it is pointing to an old version (0.13.0). This version of Hive metastore did not support the BINARY datatypes for parquet formatted files.

Read on to see how Manoj was able to fix the problem in Azure Databricks.

Comments closed

Using Docker Desktop on WSL2

Chris Taylor walks us through updating Docker Desktop for Windows to support Windows Subsystem for Linux 2:

I won’t go too much into what this is as you can read the article in the links above but to summarise, this will improve the experience of docker on windows:

– Improvements in resource consumption
– Starting up docker daemon is significantly quicker (Docker says 10s as opposed to ~1min previously)
– Avoid having to maintain both Linux and Windows build scripts
– Improvements to file system sharing and boot time
– Allows access to some cool new features for Docker Desktop users.

Some of these are improvements we’ve been crying out for over the last couple of years so in my opinion, they’re a very welcome addition.

In order to get started using WSL2, there’s a couple of steps you need to run through which I’ll try and show below with a few screen shots.

Read on for the process.

Comments closed

Restoring SQL Server Backups from Azure Blob Storage

Niko Neugebauer walks us through special considerations when using Azure Blob Storage as your backup location:

If you are using Azure Blob Storage for SQL Server Backups, you need to know a couple of important details before you start with some significant project and as you should know (and in my head I am keep on hearing Grant Fritchey angrily declaring that there is no backup strategy that exists, if there is no restore strategy to be found in the plan).

The ACL permissions required by the Restore From URL operation in SQL Server (any SQL Server right now, starting with SQL Server 2012 page blobs and including SQL Server 2019 blob storage support that was started with SQL Server 2014) will require … [drumroll] … exclusive WRITE-permissions on the de underlying file(s).

Niko explains some of the pain around that requirement, as well as a few other bees in your bonnet.

Comments closed

Azure Icons in Diagrams.Net

Dave Ruijter walks us through using Azure icons in diagrams.net (nee draw.io):

How to use the icon collection in draw.io (diagrams.net)

I’ve made it easy for you: I’ve created a couple of custom libraries with all the icons! I’ve uploaded them to my GitHub, and you can use them in a couple of ways.

The first and most simple option is to just start the diagrams.net app with an URL with the correct libraries included as URL parameters:

Click through to read the whole thing. I’m a long-time fan of diagrams.net and am glad to see Dave’s offering for us.

1 Comment

API Consumption with Power BI Dataflows

Matthew Roche covers some pointers when working with APIs:

These APIs are often slower than a database, which can increase load/refresh times. Sometimes the load time is so great that a refresh may not fit within the minimum window based on an application’s functional requirements.

These APIs may also be throttled. SaaS application vendors often have a billing model that doesn’t directly support frequent bulk operations, so to avoid customer behaviors that affect their COGS and their bottom line, their APIs may be limited to a certain number of calls for a given period.

The bottom line is that when you’re using APIs as a data source in Power BI, you need to take the APIs’ limitations into consideration, and often dataflows can help deliver a solution that accommodates those limitations while delivering the functionality your application needs.

Read on for more details including links to a few pointers.

Comments closed

ksqlDB 0.11.0

Victoria Xia announces ksqlDB 0.11.0:

ksqlDB 0.11.0 contains improvements and fixes spanning stranded transient queriesoverly aggressive schema compatibility checksconfusing behavior around casting nullsbad schema management, and more. Here, we highlight a couple of additional, notable improvements.

Also on my backlog was Andy Coates, talking about key columns in ksqlDB:

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately end with ksqlDB supporting multiple key columns and multiple key formats, including Avro, JSON, and Protobuf.

Before looking at the syntax changes in version 0.10, let’s first look at what is meant by keys in ksqlDB, the two types of key columns, and how this may differ from other SQL systems.

Read on, as it’s an interesting look at how different data architectures can mean radically different recommendations for key design.

Comments closed

When Date Tables Go Bad

Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query perform quite a bit worse:

So why did the date table not perform as well as the old-school way?

SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

My reaction was pretty much the same as Koen Verbeeck’s in the comments. Put in clearer terms, calendar tables work best when you’re joining a DATE type to a DATE type. Once you introduce times into the mix, the optimizer has to behave differently, not least because you have to do things like CAST() to coerce data types.

Comments closed