Press "Enter" to skip to content

Day: January 31, 2020

Data Lake File Formats and Security

Ashish Kumar and Jorge Villamariona continue a series on data lakes:

People from a traditional RDBMS background are often surprised at the extraordinary amount of control that data lake architects have over how datasets can be stored. Data Lake Architects, as opposed to the Relational Database Administrators, get to determine an array of elements such as file sizes, type of storage (row vs. columnar), degrees of compression, indexing, schemas, and block sizes. These are related to the big data oriented ecosystem of formats commonly used for storing and accessing information in a data lake.

It is a bit of a different world and it comes with trade-offs. The whole thing is worth reading.

Comments closed

Parameterizing a Data Factory Linked Service to a REST API

Meagan Longoria had to parameterize a linked service connecting to a REST API recently:

In order to pass dynamic values to a linked service, we need to parameterize the linked service, the dataset, and the activity.

I have a pipeline where I log the pipeline start to a database with a stored procedure, lookup a username in Key Vault, copy data from a REST API to data lake storage, and log the end of the pipeline with a stored procedure. My username and password are stored in separate secrets in Key Vault, so I had to do a lookup with a web activity to get the username. The password is retrieved using Key Vault inside the linked service. Data Factory doesn’t currently support retrieving the username from Key Vault so I had to roll my own Key Vault lookup there.

Click through for the instructions.

Comments closed

Tracing a Session with Extended Events

Jason Brimhall shows how you can trace a specific session using Extended Events:

The ability to quickly and easily trace a query is important to database professionals. This script provides one useful alternative to trace a specific spid similar to the method of using the context menu to create the trace within SSMS and Profiler.

This is yet another tool in the ever popular and constantly growing library of Extended Events. Are you still stuck on Profiler? Try one of these articles to help remedy that problem (here and here)

Read on to see how.

Comments closed

Moving Data Around in Azure Synapse Analytics

Niko Neugebauer looks at some techniques for copying data into a table in an Azure Synapse Analytics SQL Pool:

First of all, let us list some of them (and I am not even attempting on providing all of them, of course):
– INSERT INTO … SELECT FROM … (the most well known one)
– SELECT INTO … FROM … (the most well-known to perform well, since it will create a HEAP while copying most of the properties from the original table(s))
– CREATE TABLE … AS SELECT … (the old way, which must be like 10 years old on PDW/APS & Azure SQL DW, but that has never gotten into a Box Product or Azure SQL Database)
– Polybase (that will use the External Tables & externally allocated data to transfer into Azure SQL DW)
– BCP (good old tested friend that will give you a pain in the neck until you dominate it)
– OPENROWSET / BULK INSERT (some very good and very old friends with complicated histories (who remembers all the code pages?, settings and uncertain future mostly because of their original restrictions, I guess)
– COPY INTO … (the brand new command that will allow you under very neat privileges to copy data from the external storage accounts, much like BULK INSERT)

In this blog post I will simply focus on those features that have not been ported (hopefully just yet): CTAS & COPY INTO.

Read on to see how these two work. Also, I too have wanted CTAS in on-premises SQL Server for years.

Comments closed

Upgrading a SQL Server Big Data Cluster

Mohammad Darab shows how to upgrade an existing Big Data Cluster:

The above scenario was updating a Big Data Cluster from a supported release. Microsoft officially supports BDCs starting from SQL Server 2019 GDR1. But what if you have a previous version of BDCs, say CTP or release candidate? In that case you’ll have to backup any data you have, delete your cluster, uninstall azdata, install the updated azdata, and deploy your big data cluster anew. A little cumbersome but that’s how it is. In fact, no one should be running an unsupported release of Big Data Clusters anyway!

Click through for the instructions.

Comments closed