Press "Enter" to skip to content

Curated SQL Posts

All About Line Graphs

Mike Cisneros takes us through one of the most useful visuals out there:

A typical line graph will have continuous data along both the vertical (y-axis) and horizontal (x-axis) dimensions. The y-axis usually shows the value of whatever variable we are measuring; the x-axis is most often used to show when we measured it, either chronologically or based on some independent variable (e.g., as we rev our old car’s engine, we measure the decibel level at different RPM). 

While some line graphs do not use continuous data on the x-axis (particularly slopegraphs and parallel coordinates diagrams, which are specialized variants of line graphs), what we absolutely can’t use on our x-axis is data that doesn’t have any meaningful relationship among the categories shown. 

Read on for a lot of good information on a workhorse visual.

Leave a Comment

Schema Management for Spark Applications

Walaa Eldin Moustafa takes us through some of the things that LinkedIn has learned about schema management with Apache Spark:

At LinkedIn, the Hive Metastore is the source of truth catalog for all Hadoop data. The Hive Metastore is managed by Dali. Dali is a data access and processing platform that is integrated to compute engines and ETL pipelines at LinkedIn to ensure consistency and uniformity in the access and storage of data. Dali utilizes the Hive Metastore to store data formats, data locations, partition information, and table information. Among other features, Dali also manages the definition of SQL views, as well as storing and accessing those definitions from the Hive Metastore.

Read on for a good explanation of the how as well as the why.

Leave a Comment

Optimistic Concurrency in Cosmos DB

Hasan Savran takes us through optimistic concurrency with Cosmos DB:

To handle this problem, usually developers use a column like LastUpdateDt. You bring this column to Frontend and post it back to database with updated model. If LastUpdateDt hasn’t changed updates goes into database. If LastUpdateDt is changed, that means somebody else updated this model and your code rejects changes.

     In this post, I will try to answer questions like “ How do we do this in Azure Cosmos DB?”“Do I need to do all that logic manually by using Cosmos DB SDK? “ I will use Cosmos DB’s REST API to demo how Cosmos DB handles Optimistic Concurrency Control automatically. If you have experience developing any REST API, you might be familiar with headers like If-Match or If-None-Match. These HTTP headers controls what should be updated or not. Also, you can use them for caching too since they check which item is updated or not, you may need to cache items until they are changed.

Hasan has a demo for us as well, so check it out.

Leave a Comment

Managing Jobs in Availability Groups

Goncalo Cruz has a plan to run SQL Agent jobs on the primary node in an availability group:

In SQL Availability Groups the SQL jobs have to be created in all replicas and you need to add logic at the beginning of each relevant job to make it execute on the primary database. (this only applies when the local replica is the primary for the database)

If you do not add the logic they will execute with success in the primary replica but they will fail in the secondary replica.

Read on for a process which keeps jobs from running except on the primary.

Leave a Comment

Transforming JSON to CSV with Azure Data Factory

Rayis Imayev shows how to use the Flatten task with Azure Data Factory wrangling data flows:

Last week I blogged about using Mapping Data Flows to flatten sourcing JSON file into a flat CSV dataset:
Part 1Transforming JSON to CSV with the help of Flatten task in Azure Data Factory

Today I would like to explore the capabilities of the Wrangling Data Flows in ADF to flatten the very same sourcing JSON dataset.

Click through to see what’s different.

Leave a Comment

An Overview of HADR Concepts with SQL Server

Kevin Hill walks through different topics around high availability and disaster recovery:

Replication—a very Special Snowflake:

SQL SERVER REPLICATION IS NEITHER HA NOR DR.

Not everything in a SQL Server user database CAN be replicated, such as users, or tables with no Primary Key. New objects are not automatically sent from Publisher to Subscriber. System databases are not replicated.

There’s plenty of good information in here, so check it out.

Leave a Comment

Deleting Old Build Definitions in Azure DevOps

Mark Broadbent solves a problem for us:

I have been experiencing a problem for quite a while now in my current environment, in that some of our old builds cannot be deleted. When you attempt to do so it results in the following error:

One or more builds associated with the requested pipeline(s) are retained by a release. The pipeline(s) and builds will not be deleted.

Many of our pipelines have undergone a lot of change over time to the degree it is not even obvious anymore why (or indeed where) these builds are being prevented from being dropped. The only thing that is clear is that until they can be, the old build definitions will remain.

Regardless of the reason why, Mark has the answer for how.

Leave a Comment

Using Azure Key Vault with Azure Databricks

Jason Bonello shows how easy it is to integrate Azure Key Vault into Azure Databricks:

In Azure Key Vault we will be adding secrets that we will be calling through Azure Databricks within the notebooks. First and foremost, this is for security purposes. It will ensure usernames and passwords are not hardcoded within the notebook cells and offer some type of control over access in case it needs to be reverted later on (assuming it is controlled by a different administrator). In addition to this, it will offer a better way of maintaining a solution, since if a password ever needs to be changed, it will only be changed in the Azure Key Vault without the need to go through any notebooks or logic.

If you don’t use Key Vault, Databricks does include its own secrets storage, so there’s really no reason to keep them in plaintext.

Leave a Comment

Working with Spark.Net on Azure Synapse Analytics

Paul Andrew takes a look at Spark.NET (or Spark.Net or dotnet-spark or however I’m calling it this time):

The main reason I wanted access to Synapse is to play around with Spark.Net via the Synapse workspace Notebooks. Currently if deploying Synapse via the public Azure portal you only get the option to create a SQL compute pool, formally known as an Azure SQLDW. While this is good, it gives us none of the exciting things that we were shown about Synapse back in November last year during the Microsoft Ignite conference.

To get the good stuff in Azure Synapse Analytics you need access to the full developer UI and Synapse Workspace.

Click through to learn more about the experience.

Leave a Comment