Press "Enter" to skip to content

Curated SQL Posts

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

E-Mail Alerting in ADF.procfwk

Paul Andrew has an update to the Azure Data Factory Procedural Framework:

The primary goal of this release was to implement email alerting within the existing processing framework and using existing metadata driven practices to deliver this in an easy to control, flexible and granular way. That said, the following statements have been met in terms of alerting capabilities and design.

Read on for the full change list.

Comments closed

Power BI Best Practice Tips

Lazaros Viastikopoulos continues a series on Power BI tips, switching from performance to best practices:

Tip 2) Organise Measures by Grouping

Tip number two goes hand in hand with the tip explained above, as after we generate explicit measures, what should we do with all those leftover implicit measures? Surely they will confuse the report authors if they are left visible. Furthermore, if we structure our data model as a Star Schema, every fact table will contain some foreign keys to establish a relationship with the primary key in the dimension (lookup) table. Should these columns remain visible for everyone to use?

Read on to learn how, as well as details for the other four tips.

Comments closed

Actual I/O Statistics in Execution Plans

Hugo Kornelis talks about a fairly recent property in execution plans:

There are two operators that read from the SalesOrderDetail table (or from indexes on that table). The top left operator is an Index Seek on one of the nonclustered indexes on SalesOrderDetail, and on the bottom input of the Nested Loops operator is a Clustered Index Scan that scans the clustered index on the same table.

So, now what? Which of the two is in this case the problem? Is each doing exactly 625 logical reads? Is one doing 50 and the other 1200? For the longest time, there was no way to find out. Sometimes you could make an educated guess by looking at the rest of the execution plan. Sometimes you can get an idea by running other queries with similar plans and check their logical reads (like in this case, you could run the subquery by itself and that would work). But none of these methods are really satisfactory.

Read on to see how the SQL Server team has addressed this.

Comments closed

Methods for Deploying a dacpac

Erik Ejlskov Jensen shares some advice when deploying dacpac files:

I have previously blogged about using a SQL Server Database Project together with EF Core and also described a NuGet package that enables you to build a .dacpac with .NET Core, even on Linux and macOS.

So the two blog posts above cover development and build. Then next step is deployment.

The main deployment mechanism for making changes to your database based on your recently built .dacpac file, is the cross-platform sqlpackage command line tool.

You can, depending on your requirements, take advantage of several of the available actions this tool provides.

Read on to see two methods for deployment.

Comments closed

Organizing and Optimizing Power BI Dataflows

Marc Lelijveld wants your Power BI dataflows in tip-top shape:

In this blog I describe a few of the challenges you might face when you have a lot of dataflows. I will describe a few tips and tricks I am applying to sanitize your dataflow approach, organize dataflows and easy to browse through.

Dataflows are increasingly used as shared resource or staging layer inside the Power BI platform. While having dataflows, you can push down logic and reuse across different datasets. This lowers the impact on the source by extracting the data once from source to Power BI, helps in centralizing logic, having one version of the truth and lots of other advantages.

Read on for some tips and practices.

Comments closed

Understanding Monads in Scala

Anna Wykes continues a series on Scala for data engineers:

This is the second of my blogs in the Scala Parlour Series, in which we explore Scala, and why it is great for Data Engineering. If you haven’t already, please check out the first in the series here, in which you can read all about the core concepts of Scala, including who uses it and why 

In this article we will explore monads within the Functional Programming (FP) paradigm, and how they can be used in Scala to aid Data Engineering.  

Anna explains monads quite well here. This is a topic which is notoriously in how people perceive its difficulty, but conceptually it’s not as difficult as people take it to mean…if you understand a few concepts coming in.

Comments closed

Evolutionary Algorithms for Color Palette Discovery

Daniel Oehm combines two interests:

Colour theory is pretty complex stuff so choosing a good palette isn’t easy, let alone evolving one. So, you’re going to have some hits and some misses. This is definitely more for fun seeing what you discover rather than finding the perfect palette. Having said that you could discover some gold!

There are best practices when choosing a palette for data visualisation depending on the context and what is to be shown. For example people tend to respond to certain colours representing high / low, hot / cold or good / bad, there is also colourblindness considerations. evoPalette won’t necessarily adhere to these ideals.

I’d like to see a genetic algorithms approach, though you’d have to define some sort of function to score each outcome, so I can see how that’d be tricky. H/T R-Bloggers

Comments closed

Understanding Conditions Under Which Query Store Captures Queries

Milos Radivojevic dives into testing Query Store:

From SQL Server 2019, you can use the fourth option – CUSTOM which allows you to define what does „relevant query“ mean. For that purpose, you can use three parameters. A query will be captured by Query Store if it is executed at least EXECUTION_COUNT times or if for its compilation SQL Server used at least TOTAL_COMPILE_CPU_TIME_MS or the entire query execution used at least TOTAL_EXECUTION_CPU_TIME_MS CPU. All three parameters are evaluated within a period defined by using the fourth parameter – STALE_CAPTURE_POLICY_THRESHOLD with a default value of 1 hour.

You cannot define exceptions, i.e. queries that you do not want to see captured by Query Store, such as system queries or queries related to Query Store itself, you can just filter them out when you write queries against Query Store’s catalog views.

As mentioned earlier, in this post, I will go in details for the ALL option. This is default option in SQL Server 2019, and the decision to change from ALL to AUTO was a good one. However, the lack of documentation for this option can lead to mistrust and misunderstandings. Defaults are important for the feature’s newcomers; they will use defaults until they learn and test enough to probably choose better options for their workload.

There’s a lot of depth in this post, and Milos wraps up with a few rules of thumb.

Comments closed