Press "Enter" to skip to content

Author: Kevin Feasel

Querying Multiple Data Sources in Azure Synapse Analytics

James Serra walks us through querying Data Lake Storage Gen2, Cosmos DB, and a table created in an Azure Synapse serverless Apache Spark pool:

As I was finishing up a demo script for my presentation at the SQL PASS Virtual Summit on 11/13 (details on my session here), I wanted to blog about part of the demo that shows a feature in the public preview of Synapse that is frankly, very cool. It is the ability to query data as it sits in ADLS Gen2, a Spark table, and Cosmos DB and join the data together with one T-SQL statement using SQL on-demand (also called SQL serverless), hence making it a federated query (also known as data virtualization). The beauty of this is you don’t have to first write ETL to collect all the data into a relational database in order to be able to query it all together, and don’t have to provision a SQL pool, saving costs. Further, you are using T-SQL to query all of those data sources so you are able to use a reporting tool like Power BI to see the results.

Click through to see how.

Comments closed

Custom Formatting in Powershell

Jeffrey Hicks takes us through formatting in Powershell and uses Get-Process as an example:

One of the features I truly enjoy about PowerShell, is the ability to have it present information that I need in a form that I want. Here’s a good example. Running Get-Process is simple enough and the output is pretty complete. But one thing that would make it better for me, is that sometimes I want an easy way to see high-memory use properties. Yes, I can pipe Get-Process to Sort-Object and Where-Object. However, in this particular situation, what I really want is to see high-memory usage processes displayed in red. Maybe those that are getting close to my arbitrary limit I’d like to see in Yellow. This isn’t that difficult to achieve using ANSI escape sequences.

Click through to see how.

Comments closed

Issues Using EF Core Database First to Reverse Engineer SQL Server Databases

Erik Ejlskov Jensen takes us through several things to watch out for when reverse engineering a SQL Server database in Entity Framework Core:

Issue

SQL Server allows blank column names in tables, but this causes the following error when scaffolding: The string argument 'originalIdentifier' cannot be empty.

Workarounds

– Use EF Core Power Tools, which contains a fix for this. (Fix will also be in EF Core 6.0)
– Rename the column 🙂

Click through for several more issues and solutions in this vein.

Comments closed

Spark Infer Schema vs ADF Get Metadata

Paul Andrew compares two techniques for retrieving metadata:

For file types that don’t contain there own metadata (CSV, Text etc) we typically have to go and figure out there structure including; attributes and data types before doing any actual transformation work. Often I’ve used the Data Factory Metadata Activity to do this with its structure option. However, while playing around with Azure Synapse Analytics, specifically creating Notebooks in C# to run against the Apache Spark compute pools I’ve discovered in most case the Data Frame infer schema option basically does a better job here.

Now, I’m sure some Spark people will probably read the above and think, well der, obviously Paul! Spark is better than Data Factory. And sure, I accept for this specific situation it certainly is. I’m simply calling that out as it might not be obvious to everyone

Read on for a comparison of the two techniques.

Comments closed

Azure Data Studio, October 2020 Edition

Alan Yu shows off this month’s changes in Azure Data Studio:

You can now deploy Azure SQL resources from the deployment wizard in Azure Data Studio. These new options sit alongside local options like SQL Server on-premises and on Big Data Clusters and hybrid options, like SQL Managed Instance on Azure Arc. The deployment wizard includes UI-assisted Notebook experiences to deploy Azure SQL virtual machines and links to the Azure portal to create SQL databases, database servers, and elastic pools (SQL managed instances are not yet included).

Click through for more information.

Comments closed

Calculation Groups with Disconnected Tables in Power BI

Gilbert Quevauvilliers shows how to build a calculation group based on a disconnected table in Power BI and Azure Analysis Services:

I know that some of this might be able to be done with other calculation groups. I find I have more flexibility when combining Calculation Groups with a disconnected table.

Below are some of my previous calculation group blog posts that might also be of interest:

Create Currency Formatting Strings using Calculation Groups in Power BI Pro & Premium / Azure Analysis Services / SQL Server Analysis Services 2019

How to create and use Calculation Groups in Power BI Pro & Premium / Azure Analysis Services / SQL Server Analysis Services 2019

Click through for the demo.

Comments closed

Data Masking Improvements in dbatools

Sander Stad walks us through some changes to the data masking algorithm in dbatools:

If you’ve used the data masking command in dbatools you’ve probably noticed that the PowerShell session becomes memory intensive when it has to handle larger tables with one or more unique indexes.

The reason that happens is that during the data masking process the command looks for any unique indexes in the table. If it finds a unique index it will create a unique row for all the columns in the unique index.

Read on to see how Sander handled this.

Comments closed

Transactional Replication Error: Remote Server Does Not Exist

Garland MacNeill takes us through a replication issue:

For the past couple of days, I’ve been working on getting transactional replication set up between a couple of servers in between other projects I’ve been working on. For the last day I kept running into the following error:

“The remote server <“server name”> does not exist, or has not been designated as a valid Publisher, or you may not have permissions to see available Publishers. ” 

Click through for the solution.

Comments closed

MLOps with Azure Databricks and MLflow

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production.  Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps IntegrationJenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.

When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Shortest Path Calculations with Dijkstra’s Algorithm

Holger von Jouanne-Diedrich takes us through Dijkstra’s algorithm for shortest path calculations:

This post is partly based on this essay Python Patterns – Implementing Graphs, the example is from the German book “Das Geheimnis des kürzesten Weges” (“The secret of the shortest path”) by my colleague Professor Gritzmann and Dr. Brandenberg. For finding the most elegant way to convert data frames into igraph-objects I got help (once again!) from the wonderful R community over at StackOverflow.

Dijkstra’s algorithm is a recursive algorithm. If you are not familiar with recursion you might want to read my post To understand Recursion you have to understand Recursion… first.

Click through for an implementation in R.

Comments closed