Press "Enter" to skip to content

Curated SQL Posts

Using Notebooks to Load Data into the Databricks File System

Tomaz Kastrun is putting together an Advent of Azure Databricks:

Yesterday we started working towards data import and how to use drop zone to import data to DBFS. We have also created our first Notebook and this is where I would like to start today. With a light introduction to notebooks.

Read on for a depiction of notebooks, as well as an example which loads data into the Databricks File System (DBFS).

Comments closed

Moving Power BI Dataflows Across Workspaces

Mark Lelijveld has updated a script for us:

Over a year ago, I wrote a blog about moving dataflows across workspaces using a PowerShell script. Especially useful if you want to move dataflow logic from your development to test, acceptance or production workspace.

I received a bunch of feedback on this script and run into some issues myself as well lately. It was about time for an update of the script! Below I share the issues that are addressed in this new version and what new additions are added to the script.

Click through for details on the update.

Comments closed

Scalar Functions Not Allowed in T-SQL

Hugo Kornelis has a list for us:

But the optimizer also has a set of scalar functions available that are not allowed within T-SQL code, but that it can use in properties of some operators. The table below lists all these functions, with a description of their input parameters and their result.

You can’t use these directly, but they are useful nonetheless.

Comments closed

Batch Mode with Temp Tables

Erik Darling continues receiving big paydays from Big Temp Table:

When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Read on to see how you can create a temp table which triggers batch mode processing fairly easily.

Comments closed

Creating an Azure Purview Catalog Instance

Wolfgang Strasser wants to try out Azure Purview:

Basics – Resource group, purview account name (this cannot be changed afterwards) and the location.

As of today (2020-12-06), there are only 5 Azure regions you can choose from to store the Purview metadata. But – in-region scanning from 16 other Azure regions is available in the preview (source)

This is part one of a multi-part series, so stay tuned for more.

Comments closed

BULK INSERT and Advent of Code

Thomas Rushton performs BULK INSERT I imagine the way most of us do, through trial and lots of error:

Step one in solving these problems in SQL Server is loading the data into a database so you can do something with it.

I chose to use BULK INSERT, because, according to the documentation, it:

Imports a data file into a database table or view in a user-specified format in SQL Server

Ideal, right?

Click through for several tips around BULK INSERT.

Comments closed

Joining Data Streams in Flink

Kundan Kumarr crosses the streams:

Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like UnionJoin, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window.

Click through for an example of the fluent API approach. It’s not as nice as proper SQL, but it does the job.

Comments closed

Spark Starter Guide: Data Standardization

Ladon Robinson continues the Spark Starter Guide:

Standardization is the practice of analyzing columns of data and identifying synonyms or like names for the same item. Similar to how a cat can also be identified as a kitty, kitty cat, kitten or feline, we might want to standardize all of those entries into simply “cat” so our data is less messy and more organized. This can make future processing of the data more streamlined and less complicated. It can also reduce skew, which we address in Addressing Data Cardinality and Skew.

We will learn how to standardize data in the following exercises.

Check it out. I’m excited to see the Spark Starter Guide get fleshed out and written.

Comments closed

Azure Synapse Analytics Goes GA

Sacha Tomey recaps some announcements:

After much anticipation, today, Microsoft have announced the general availability of Azure Synapse Analytics! Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and Big Data analytics all into a single service, accelerating time to insights, enabling organisations to become data-driven. Azure Synapse combines capabilities spanning the needs of data engineering, machine learning, and BI without creating silos in processes and tools.

Read on for more info on this as well as info on Azure Purview.

Comments closed