Press "Enter" to skip to content

Curated SQL Posts

Running an mlflow Server on Azure

Paul Hernandez configures mlflow on Azure using platform-as-a-service offerings:

It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.

To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.

I concur on the power of mlflow.

Comments closed

String Modification in T-SQL

Steve Jones answers a question:

Recently I ran across a question posted by a beginner on the Internet and thought this would be a good, basic topic to cover. The question was: how can I replace a value in a comma separated string in a table?

This post covers the basics of this task.

Incidentally, this is where I say hey, that sounds like a failure in normalization. If you need to care about individual values in a collection, your value is not atomic. But that’s a bit of a tangent.

Comments closed

Transforming Arrays in Azure Data Factory

Mark Kromer shows off a few functions in Azure Data Factory to modify data in arrays:

The first transformation function is map() and allows you to apply data flow scalar functions as the 2nd parameter to the map() function. In my case, I use upper() to uppercase every element in my string array: map(columnNames(),upper(#item))

Read on for more iteration and aggregation functions akin to map, reduce, and filter.

Comments closed

When to Use Event Sourcing

Vikas Hazrati takes us through the pros and cons of using event sourcing for a project:

You would always get a ton of literature on Event Sourcing and CQRS. The key question is WHEN do you use it? Under what circumstances? Is your problem really in need of ES?

I would not go into the details of what Event Sourcing and CQRS is. The industry stalwarts have covered that in adequate detail. This post delves into battle-tested scenarios on where we should have used and otherwise ignored ES.

Click through for an analysis of pros and cons, as well as some advice on what it all means.

Comments closed

Configuring a Linked Server to Oracle

Emanuele Meazzo needs to pull data from Oracle into SQL Server:

The most atrocious part of my search for glory was without doubt navigating all the packages to download and install for each component, between broken links and differences between the instructions and the actual content, it’s a mess.

It took a while, based on Emanuele’s tone. With SQL Server 2019, you can avoid some of this pain by using PolyBase. But for prior versions of SQL Server, your options are more limited.

Comments closed

Top N with Others in Power BI

Marco Russo and Alberto Ferrari cover a pain point in Power BI:

The VisibleProducts variable contains a list of products for the selection currently displayed in the visual. In the example, we have the top 3 products for each Product Category included in our report. The ranking that is returned is only up to the value selected in the TopN parameter – for this reason, we can use the result of Ranking by Sales to filter the visual, including only the products ranked in the 1-to-TopN Value range. We use a filter in the Power BI filter pane to accomplish this task.

This is a common enough pattern that I do wish Power BI made it easy.

Comments closed

Basic Theory on Correlation Analysis, Using R

Petr Baranovskiy wants to take us through the key concepts of correlation analysis, starting with basic theory:

When I was learning statistics, I was surprised by how few learning materials I personally found to be clear and accessible. This might be just me, but I suspect I am not the only one who feels this way. Also, everyone’s brain works differently, and different people would prefer different explanations. So I hope that this will be useful for people like myself – social scientists and economists – who may need a simpler and more hands-on approach.

These series are based on my notes and summaries of what I personally consider some the best textbooks and articles on basic stats, combined with the R code to illustrate the concepts and to give practical examples. Likely there are people out there whose cognitive processes are similar to mine, and who will hopefully find this series useful.

This is clear and well-written, so check it out even if you feel like you have a solid understanding of the topic.

Comments closed