Curated SQL – Page 699 – A Fine Slice Of SQL Server

Solving Linear Constraints with Python

Published 2021-11-19 by Kevin Feasel

Luke Menzies and Gavita Regunath create a schedule:

Linear optimisation (often referred to as linear programming) is not cutting edge or new. It has been around for a very long time. It was first introduced within the field of operational research during World War II, where it was used to help minimise costings. The method proposed for solving these problems is known as the simplex method, and it hasn’t changed much today. Although this method hasn’t changed significantly, what has changed significantly is the computing power and accessibility of this technique, allowing these methods to be used on very complex scenarios with almost a click of a button. Convenient libraries have allowed the intricate complexities of setting these problems up on a computer to be simplified.

Read on for an example of linear programming. This is something I’ve always enjoyed, but haven’t had many places to use this technique in my professional career. That said, shout out to everyone who’s ever used LINGO.

Comments closed

Validating Pandas DataFrames with Pydantic

Published 2021-11-19 by Kevin Feasel

Sebastian Cattes continues a series on using Pydantic:

In part 1 of the article we learned about dynamic typing, Pydantic and decorators.
In this part we will learn how to combine these concepts for Pandas DataFrame validation in our codebase.
1. Combining Decorators, Pydantic and Pandas – Combine section 2. and 3. of Part 1 to showcase how to use them for output validation.
2. Let’s define ourselves a proper spaceship!
3. Summary

Check out both parts of the article.

Comments closed

Re-Making a Line Chart

Published 2021-11-19 by Kevin Feasel

Alex Velez cleans up a line chart:

When sharing a makeover, typically I’ll show a side-by-side “before” and “after” view. This is a powerful moment for many audiences as they witness the dramatic impact of an effective graph. I share this with you, because until recently if a combination chart was found within my makeovers it represented the “before” state. That’s because most combo charts are hard to read, so I tend to revise them into something simpler—like this makeover.
Today’s article shows the inverse of that process, where, in order to make a visual more informative and easier to understand, I chose to transform the original, a “simple” line chart, into a more “complicated” combination chart.

This is a good reminder that visuals themselves aren’t necessarily bad (except for the pie chart, which is inherently evil and don’t try to convince me otherwise); it’s all about whether the specific chart makes sense given the story you are trying to tell.

Comments closed

ETL from an API

Published 2021-11-19 by Kevin Feasel

Bill Fellows unravels a bad practice:

The direction for software as a service providers is to provide APIs to access their data instead of structured file exports. Which is a pity, as every SaaS system requires a bespoke data extract solution. I inheireted a solution that had an adverse pattern I’d like to talk about.

The next level of complexity, at least in the space Bill covers in the example, is what to do when the upstream provider changes their data, some with changes even a week later.

Comments closed

Non-Yielding IO Completion Ports

Published 2021-11-19 by Kevin Feasel

Sean Gallardy is here to demystify a concept:

IO Completion Ports are a set of Windows APIs which allow for efficient, fast, multithreaded asynchronous IO. Great, that pretty much tells you nothing.
SQL Server uses IO Completion Ports not for disk-based IO but for general network IO when it comes into SQL Server for TDS level items. This means it’s used for things such as connecting to an instance of SQL Server, sending batch and rpc information, etc., and is used to properly take actions on the incoming items. These actions should be extremely short and quick, the name of the game is low latency and high throughput which means not doing things like reading or writing from disk, allocating memory, calling functions that may block, etc., to keep things flowing.

Read on to see what happens when there is a problem and what might cause that problem.

Comments closed

T-SQL Tuesday 144 Roundup

Published 2021-11-19 by Kevin Feasel

Victoria Holt recaps T-SQL Tuesday #144:

This month’s T-SQL Tuesday attracted some great responses! Thank you to everyone who participated!
My invitation for this month’s #tsql2sday was 3 fold on sharing your experiences on data governance
– The current cost of data governance versus its benefits
– The amazing things data governance has enabled you to achieve or will enable you to achieve in the future
– The potential uses for Azure Purview within your estates and the automated deployment options for that

Read on for the recap.

Comments closed

Benchmarking Databricks vs Snowflake

Published 2021-11-18 by Kevin Feasel

Mostafa Mokhtar, et al, respond to some benchmarking claims:

On Nov 2, 2021, we announced that we set the official world record for the fastest data warehouse with our Databricks SQL lakehouse platform. These results were audited and reported by the official Transaction Processing Performance Council (TPC) in a 37-page document available online at tpc.org. We also shared a third-party benchmark by the Barcelona Supercomputing Center (BSC) outlining that Databricks SQL is significantly faster and more cost effective than Snowflake.
A lot has happened since then: many congratulations, some questions, and some sour grapes. We take this opportunity to reiterate that we stand by our blog post and the results: Databricks SQL provides superior performance and price performance over Snowflake, even on data warehousing workloads (TPC-DS).

Posts like this are exactly why getting rid of the DeWitt clause is important. I’d rather have Snowflake and Databricks duking it out with publicly-available and testable processes. When reading this, the most important part of this post was the several exhortations to try it out yourself, both for the Databricks test and the Snowflake test. Make benchmarking public, including hardware choices, configuration choices, and the testing process; then, I can tell for sure if your benchmark makes sense for my use case.

1 Comment

Aggregate Window Functions

Published 2021-11-18 by Kevin Feasel

I have a series on window functions:

Here, we get the sum of LineProfit by CustomerID. Because SUM() is an aggregate function, we need a GROUP BY clause for all non-aggregated columns. This is an aggregate function. The full set of them in T-SQL is available here, but you’ll probably be most familiar with MIN(), MAX(), SUM(), AVG(), and COUNT().
To turn this into a window function, we slap on an OVER() and boom! Note: “boom!” only works on SQL Server 2012 and later, so if you’re still on 2008 R2, it’s more of a fizzle than a boom.

Read on for several examples of this nature.

Comments closed

API Servers and the Importance of Learning

Published 2021-11-18 by Kevin Feasel

Steve Jones tells a story:

While talking with a client recently about their performance challenges, I was relieved to find that the database wasn’t the problem. Instead, their API server was overloaded by the number of calls taking place in their application. While the database did provide the backing for the API calls, there was a fair amount of caching. However, as they’d moved to microservices, more and more of the interaction between modules was taking place as a network call to a single server, which became overloaded.

Steve goes on to the broader point of people freely donating their time and expertise to explain how to solve problems. And the above is a major problem of moving to microservices: everything gets several times chattier. The biggest tricks I have there are to embrace asynchronous processing via queues and ensure that messages passed back and forth are as small as possible, which means getting rid of the idea of passing big lists of fully-hyrdated objects around.

Comments closed

Get Power BI Data from Google Sheets

Published 2021-11-18 by Kevin Feasel

Reza Rad shows off a new connector:

Power BI can get data from Google Sheet now. This functionality is released just yesterday and announced in both Power BI and Power Query blogs. The feature is still preview (Beta) but it is worthwhile looking at how it works in a quick article and video.

There are several steps involved but it’s still a lot simpler than the old method of parsing a website, especially if you had any sort of security on the spreadsheets.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts