2022-05-05 – Curated SQL

Discovering Data Drift with DVC

Published 2022-05-05 by Kevin Feasel

Milecia McGregor looks at a version control system for ML projects (and data):

What happens when the machine learning model you’ve worked so hard to get to production becomes stale? Machine learning engineers and data scientists face this problem all the time. You usually have to figure out where the data drift started so you can determine what input data has changed. Then you need to retrain the model with this new dataset.
Retraining could involve a number of experiments across multiple datasets, and it would be helpful to be able to keep track of all of them. In this tutorial, we’ll walk through how using DVC, an open source version control system for machine learning projects, can help you keep track of those experiments and how this will speed up the time it takes to get new models out to production, preventing stale ones from lingering too long.

My team is working on integrating DVC. It’s a really good project for analytics teams, as it extends the notion of version control to datasets and helps you tie in code (source control), models (tools like MLflow), and data.

Comments closed

Flink 1.15 Released

Published 2022-05-05 by Kevin Feasel

Joe Moser and Yun Gao announce Apache Flink 1.15:

Thanks to our well-organized and open community, Apache Flink continues to grow as a technology and remain one of the most active projects in the Apache community. With the release of Flink 1.15, we are proud to announce a number of exciting changes.
One of the main concepts that makes Apache Flink stand out is the unification of batch (aka bounded) and stream (aka unbounded) data processing, which helps reduce the complexity of development. A lot of effort went into this unification in the previous releases, and you can expect more efforts in this direction.
Apache Flink is not only growing when it comes to contributions and users, but also out of the original use cases. We are seeing a trend towards more business/analytics use cases implemented in low-/no-code. Flink SQL is the feature in the Flink ecosystem that enables such uses cases and this is why its popularity continues to grow.

Flink SQL is Feasel’s Law in action.

Comments closed

Defining Index Seeks Down

Published 2022-05-05 by Kevin Feasel

Brent Ozar explains that index seeks aren’t always as selective as they sound:

When you see “index seek” on an execution plan, that doesn’t mean SQL Server is jumping to exactly the row you’re looking for. It only means that SQL Server is seeking on the first column of the index.
This is especially misleading on indexes where the first column isn’t very selective.

Definitely worth the read.

Comments closed

Fun with Natural Full Join

Published 2022-05-05 by Kevin Feasel

Lukas Eder shows off natural joins:

At first I though of the UNION CORRESPONDING syntax, which doesn’t really exist in most SQL dialects, even if it’s a standard feature. But then, I remembered that this is again a perfect use case for NATURAL FULL JOIN, this time slightly differently from the above example where two tables are compared for contents. This time, we want to make sure the two joined tables never have matching rows, in order to get the UNION like behaviour.

I wasn’t aware of the notion of natural joins because they’re not available in SQL Server. They are available in Oracle, Postgres, and MySQL. Fun as Lukas’s blog post is, I could see natural joins going wrong in so many ways.

Comments closed

Finding Azure SQL DB Backup History

Published 2022-05-05 by Kevin Feasel

Taiob Ali takes us through a new DMV:

There is a new DMV currently in preview which returns information about backups of Azure SQL databases except for the Hyperscale tier. Microsoft official documentation is here.
If you run the example query as-is from the above documentation some of the columns do not make sense.

Taiob includes a better query which provides the type of information you’re used to in on-premises SQL Server.

Comments closed

Forced Parameterization and Local Variables

Published 2022-05-05 by Kevin Feasel

Erik Darling tries something out:

I think it was sometime in the last century that I mentioned I often recommend folks turn on Forced Parameterization in order to deal with poorly formed application queries that send literal rather than parameterized values to SQL Server.
And then just like a magickal that, I recommended it to someone who also has a lot of problems with Local Variables in their stored procedures.
They were curious about if Forced Parameterization would fix that, and the answer is no.

Shot down by the third paragraph of the intro. That’s rough. Still, click through for the demo.

Comments closed

Creating an Info Button in Power BI

Published 2022-05-05 by Kevin Feasel

Kristyna Hughes shows how to create an info tooltip in Power BI:

The steps below will walk through how to add an information icon to the report, making a tooltip page containing your additional information, and enabling the tooltip to allow users to hover over the icon and see the information.

This can be quite useful, especially as it gets context information out of the way of users after they don’t need it anymore. That’s important for dashboards you expect people to look at regularly.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: May 5, 2022

Discovering Data Drift with DVC

Flink 1.15 Released

Defining Index Seeks Down

Fun with Natural Full Join

Finding Azure SQL DB Backup History

Forced Parameterization and Local Variables

Creating an Info Button in Power BI