2020-12-09 – Curated SQL

Connecting to Blob Storage from Azure Databricks

Published 2020-12-09 by Kevin Feasel

Tomaz Kastrun continues opening doors in the advent calendar:

Yesterday we introduced the Databricks CLI and how to upload the file from “anywhere” to Databricks. Today we will look how to use Azure Blob Storage for storing files and accessing the data using Azure Databricks notebooks.

Click through to see how.

Comments closed

Thoughts on R’s New Pipe

Published 2020-12-09 by Kevin Feasel

John Mount has thoughts on the upcoming pipe operator in R:

There is a current active discussion on this prototype and some interesting points come up. Note the current proposal appears to disallow a |> f -> f(a), a currently popular transform.
1. This is a language feature presented as a soon-to-be-user-visible prototype, not an RFC.
2. Some are objecting to the term “pipe.”
3. Some call this sort of pipe function composition.
4. It is noticed that this sort of substitution is generally thought of as a “macro.”
5. There is a claim the proposed pipe seems to violate the beta-reduction rule of the lambda calculus: variables should be substitutable for values.

Read on for John’s take on this. I particularly appreciate his response to point number 2: other functional languages have pipes (in fact, |> is the F# pipe operator). Pipes are not unique to UNIX. John has a lot of interesting comments, so check them out.

Comments closed

Creating an app with Suave and F#

Published 2020-12-09 by Kevin Feasel

Diogo Souza shows off the Suave framework:

F# is the go-to language if you’re seeking functional programming within the .NET world. It is multi-paradigm, flexible, and provides smooth interoperability with C#, which brings even more power to your development stack, but did you know that you can build APIs with F#? Not common, I know, but it’s possible due to the existence of frameworks like Suave.io.
Suave is a lightweight, non-blocking web server. Since it is non-blocking, it means you can create scalable applications that perform way faster than the ordinary APIs. The whole framework was built as a non-blocking organism.

I will shout from the rooftops that data platform developers should learn functional programming. In the .NET space, that’s F#.

Comments closed

Clouds as Single Points of Failure

Published 2020-12-09 by Kevin Feasel

Denny Cherry argues that you should not consider a cloud provider as a single point of failure:

Having a two cloud providers isn’t going to save you from an outage. The public cloud providers (Microsoft Azure, Amazon AWS, Google GCP, etc.) have specifically designed their networks so that an outage at one region doesn’t impact other regions.
The day before US Thanksgiving (November 25, 2020), AWS had a major outage where the east-us facility suffered an outage for several hours. But you’ll notice something very interesting about this outage. No other AWS region was impacted by this outage. This is a very important distinction, as it shows that having multiple regions within AWS would give a solid Disaster Recovery strategy a great fail-over experience.

I’m mostly in agreement with Denny on this, but then I’d also have to point out the Azure AD issue which crippled Azure work across the globe, or the Azure DevOps service going down for a period of time (because everything was hosted in one data center and there was an issue). Depending on just how important uptime is, it can still make sense to be multi-cloud, especially if we use a broad enough definition which includes on-premises as a “local cloud.” In extreme cases—say, you lose millions of dollars per hour of downtime—the cost of a belt + suspenders approach is well below the expected loss from an outage.

Comments closed

Issues Deploying Azure Synapse Analytics via ARM Template

Published 2020-12-09 by Kevin Feasel

Paul Andrew hits on some growing pains:

Just last week we heard the announcement from Microsoft that Azure Synapse Analytics is now generally available (GA)… A full year on, plus a few weeks, since first seeing Synapse at the big USA conferences in November 2019.
Today I’ve been attempting to use the resource with a view to implementing it for several customer projects. Although GA, it would seem that many part of the technology are far from ready.
In this brief blog I’m exposing some of the pain I’ve faced so far in simply trying to deploy a second instance of Azure Synapse Analytics using ARM templates.

Click through for three that Paul found. I’d expect that most of these will be tidied up in the next few months.

Comments closed

Azure Purview: External Connections and the Starter Kit

Published 2020-12-09 by Kevin Feasel

Wolfgang Strasser continues a series on Azure Purview. First up is a look at external connections:

During my tests with Azure Purview I found out, that in my demo accounts (I created multiple to test it) the Management menu was missing some items.

Read on to learn why and how you can rectify this. Then, check out Wolfgang’s take on the Starter Kit:

Very often, when I want to test some new services I miss some infrastructure and environments I can start and play with. I am not talking about creating a new Azure Purview account (see my previous blog post – Creating an Azure Purview account) – I am talking about the data infrastructure to analyze, catalog and gain knowledge out of it.
We could start to create such an infrastructure, BUT: the Purview team create a Starter Kit to quickly create a data estate and configure everything that you can start with Purview within a view minutes.

Read on for one issue (of the self-inflicted variety) Wolfgang ran into during deployment. But it does look like a great way to get started with Purview and build up a relevant demo environment.

Comments closed

Create the Fastest Number Series Generator

Published 2020-12-09 by Kevin Feasel

Itzik Ben-Gan has a challenge for us:

I always used my solution with a base table value constructor cardinality of 2, but Marcos’ comment made me think. This tool is so useful that we as a community should join forces to try and create the fastest version that we possibly can. Testing different base table cardinalities is just one dimension to try. There could be many others. I’ll present the performance tests that I’ve done with my solution. I mainly experimented with different table value constructor cardinalities, with serial versus parallel processing, and with row mode versus batch mode processing. However, it could be that an entirely different solution is even faster than my best version. So, the challenge is on! I’m calling all jedi, padawan, wizard and apprentice alike. What’s the best performing solution that you can conjure? Do you have it within you to beat the fastest solution posted thus far? If so, share yours as a comment to this article, and feel free to improve any solution posted by others.

Give it a try.

Comments closed

First Thoughts on Amazon Babelfish

Published 2020-12-09 by Kevin Feasel

Ryan Booz shares some first thoughts on Amazon’s Babelfish offering:

The imputes for creating the tool is clear for AWS. Provide a way for customers to easily connect a SQL Server app to Aurora Postgres, saving big on licensing fees and reducing total cost of ownership. Assuming the tool is successful at some level, I’m sure it will provide a revenue boost for Amazon and some customers might (initially) feel a win. No harm, no foul on Amazon for leading the effort. Free markets, baby!
No matter how clever Babelfish is, however, I just can’t see how this is ultimately a win for SQL Server or PostgreSQL… or the developers that will ultimately need to support these “hybrid” apps.

I think Ryan makes good points and does hit upon the crux of the problem. I’d also say that there’s a secondary problem which Ryan hints at, but it is that a query may be sufficiently fast in one database variant but perform horridly in another. A classic example of this is a solution built on cursors in Oracle and then bringing that to T-SQL.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: December 9, 2020