Press "Enter" to skip to content

Day: October 14, 2020

MLOps with Azure Databricks and MLflow

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production.  Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps IntegrationJenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.

When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Shortest Path Calculations with Dijkstra’s Algorithm

Holger von Jouanne-Diedrich takes us through Dijkstra’s algorithm for shortest path calculations:

This post is partly based on this essay Python Patterns – Implementing Graphs, the example is from the German book “Das Geheimnis des kürzesten Weges” (“The secret of the shortest path”) by my colleague Professor Gritzmann and Dr. Brandenberg. For finding the most elegant way to convert data frames into igraph-objects I got help (once again!) from the wonderful R community over at StackOverflow.

Dijkstra’s algorithm is a recursive algorithm. If you are not familiar with recursion you might want to read my post To understand Recursion you have to understand Recursion… first.

Click through for an implementation in R.

Comments closed

Use T-SQL to Check the SQL Agent Status

Jack Vamvas shows how you can check on the SQL Server Agent status using T-SQL:

I want to check if SQL Server Agent is running using t-sql.    I know I can check through the services.msc or other methods such as Powershell scripts to report on the SQL Server Agent status , but the requirement is to extract this information via t-sql. Do you have a script using t-sql to get the SQL Server Agent Status?

There is a method and Jack shows us the way.

Comments closed

Looking at BDC in Kubernetes with Lens

Mohammad Darab shows off a tool to monitor the Kubernetes cluster driving a Big Data Cluster:

I don’t recall how I came across this Kubernetes IDE called Lens, but all I know is it’s cool as hec! It connects to a Kubernetes cluster (using the kube config file) and gives you an in depth view of all the different Kubernetes objects, their associated yaml files, health/metrics, etc. In this blog post I will show you how we can look into a Big Data Cluster’s Kubernetes infrastructure using Lens.

Click through for instructions on installation, as well as how to use the product.

Comments closed

Updating Power BI Report Parameters via Powershell

Martin Schoombee gets off the beaten path:

We’ve only used the PowerShell cmdlets for Power BI so far in this series, but things are about to get interesting because there aren’t cmdlets available for everything you might want to do. One such thing is updating parameters, and we’re going to use the Power BI REST API (which the cmdlets use underneath the covers anyways) to achieve that.

REST APIs are usually a little tricky to deal with, especially the process of authentication. Fortunately there is an Invoke-PowerBIRestMethod cmdlet that makes it possible to use the API in PowerShell without the need to deal with some of the underlying complexities.

Read on to see how the whole process works.

Comments closed

Metadata-Driven ADF Pipelines for Synapse

Hope Foley is back:

So what’s the hard thing I want to help make easier in this post?  Metadata driven pipelines in Azure Data Factory!  I had the opportunity awhile back to work with a customer who was pulling data out of large SQL Servers to eventually land data into Azure Synapse SQL pools back when they were still Azure SQL DW.  We created a couple load pattern pipelines that used metadata in Azure SQL DB to load Synapse sql pool tables from parquet files in Azure Data Lake Storage (ADLS) Gen 2. 

Not gonna lie, the pipelines weren’t easy for me to learn to setup initially.  Big thanks to Catherine for your blog which was a life preserver in the hardest parts!  So I wanted to see if I could automate it in my old friend PowerShell. 

It would also be worth looking at some of the work Paul Andrew has done around ADF.procfwk for another approach to the problem.

Comments closed

Grouping with GroupKind in Power Query

Ed Hansberry illuminates:

If you’ve used Power Query long enough, you’ve needed to group data and provide some sort of summary. For example, grouping by customer and summarizing sales. The normal way the Table.Group function works is to group the entire table by the field(s) you are grouping by, then providing whatever aggregations you need.

However, you can alter that behavior and group data by a column, and have it provide a separate group at each change in the data.

Click through for the rest of the story. There’s some complexity to the problem which is hard to summarize.

Comments closed

Copying an Azure SQL Database

Garry Bargsley gives us two methods for copying an Azure SQL Database:

Copying an Azure SQL Database is a vital skill when managing cloud databases. Recently, a request was received from the “business”. They wanted to create a copy of an Azure SQL Database that was in a development environment. The database has been certified and early testing was accepted. They now want an exact copy in QA to start integration testing. The process of making an Azure SQL Database copy is straightforward. There are several different ways to perform this action.

Two methods chosen will use the Azure Portal and PowerShell to demonstrate the completion of this request.

Click through for the demos.

Comments closed