Press "Enter" to skip to content

Month: September 2021

Avoiding WHILE 1=1 Loops

Aaron Bertrand does not believe in the power of the infinite loop:

A short time ago a colleague had an issue with a Microsoft SQL Server stored procedure. They were using our recommended approach for batching updates, but there was a small problem with their code that led to the procedure “running forever.” I think we’ve all made a mistake like this at one point or another; here’s how I try to avoid the situation altogether.

The argument isn’t “don’t use WHILE loops” or “don’t use batching logic,” but instead to ensure that you have a break condition somewhere. It’s reasonable to ask for an end state before you begin processing something, after all.

Comments closed

.NET 5 C# Language Extension for SQL Server

Nikita Takru makes an announcement:

SQL Server 2019 supports the R, Python, and Java Language Extensions. These language extensions provide many benefits to developers. They provide data security, rapid speed for deployment, and ease of integration.

Previously, we announced the release of JavaR, and Python extensions. Today we are thrilled to share that we are open sourcing the .NET 5 C# Language Extension for SQL Server on GitHub.

My response to this is roughly the same as what Panagiotis Kanavos put in a GitHub issue. F#, being a functional programming language intended for things like data analysis, would be a no-brainer. And the choice of Windows-only support probably helped get it out the door faster, but can be limiting over time.

I guess the short version is, we’ll see what the community does with this. I’d also like to see how it compares to CLR on actions. My guess is that it’ll be slower based on the way the R and Python engines work, but that’s just a guess. Overall, I’m happy this exists and want it to be more useful.

Comments closed

Wait Stats on Sort Spills

Erik Darling starts a new series on wait stats, starting with one particular topic:

The point is not that spills are the sole things that cause these waits, it’s just to give you some things to potentially watch out for if you see these waits piling up and can’t pin down where they’re coming from.

In all the queries, I’m going to be using the MAX_GRANT_PERCENT hint to set the memory grant ridiculously low to make the waits I care about stick out.

Click through for the first of several demonstrations.

Comments closed

Attributing Redshift Costs to Users

Jason Pedreza, et al, show how you can break down query utilization by user in an Amazon Redshift database:

At its simplest form, cost attribution can be determined using the amount of the storage assigned to the individual objects using the ownership of the objects to the groups. But the downside of this approach is it doesn’t provide a true translation of the resource usage. For example, let’s say Team 1 has total object size of 1 TB, whereas Team 2 has 100 GB in total size. Team 1 member runs 10 queries daily, and Team 2 runs 1,000 queries per day. Of course, Team 2 uses more resources than Team 1.

The Amazon Redshift RA3 architecture allows you to pay for the compute and data warehouse storage capacity separately, therefore storage doesn’t reflect the resources used by the teams for the cost attribution.

Click through to see how.

Comments closed

LATERAL VIEW in Hive

The Hadoop in Real World team provides a quick example of a powerful feature in Apache Hive:

Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row. 

Click here if you like to know the difference between UDF, UDAF and UDTF

A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

In other words, LATERAL joins are the SQL standard for Microsoft’s CROSS APPLY operator. I normally dislike having different names for the same thing due to the risk of confusion, but in fairness to Microsoft on this one, my recollection is that the common name came after SQL Server 2005, which already had CROSS APPLY and OUTER APPLY.

Comments closed

Learn, Test, Forget

Erik Darling shows us the bright side of a memory equipped to forget:

I often have to talk clients down from odd ledges. At this point, there’s about 20 years of advice about SQL Server out there, and for various reasons received wisdom often gets migrated en masse, whether it’s still applicable or not.

Outdated trace flags and settings are pretty common, and don’t get me started on coding practices.

But I get it — that stuff came along with advice that still makes sense — like parallelism and memory settings, separating off tempdb, etc.

Unfortunately, it often leads people to the “it used to be fast so it should always be fast” trap based on what used to work, rather than what makes sense currently. Or even what the current problem is.

Read the whole thing. Also, in a rare double-quotation, I want to cite this line from Erik:

“I don’t know, xxVanWilderFan420xx said they were bad and we should avoid them.”

Never fully trust a person going by the pseudonym xxVanWilderFan420xx. However, if it were xxSolomonKaneFan420xx, 100% believe.

Comments closed

LAG() vs All

Kathi Kellenberger shows the power of the LAG() function in T-SQL:

Microsoft introduced the first window (aka, windowing or windowed) functions with SQL Server 2005. These functions were ROW_NUMBERRANKDENSE_RANKNTILE, and the window aggregates. Many folks, including myself, used these functions without realizing they were part of a special group. In 2012, Microsoft added several more: LAG and LEADFIRST_VALUE and LAST_VALUEPERCENT_RANK and CUME_DISTPERCENTILE_CONT, and PERCENTILE_DISC. They also added the ability to do running totals and moving calculations.

These functions were promoted as improving performance over older techniques, but that isn’t always the case. There were still performance problems with the aggregate functions introduced in 2005 and the four of the functions introduced in 2012. In 2019, Microsoft introduced Batch Mode on Row Store, available on Enterprise and Developer Editions, that can improve the performance of window aggregates and the four statistical functions from 2012.

I started writing this article to compare some window function solutions to traditional solutions. I found that there were so many ways to write a query that includes a column from another row that this article is dedicated to the window functions LAG and LEAD.

In these sorts of circumstances, LAG() is extremely efficient at its job. Click through to see just how efficient.

Comments closed

Modifying Base Tables from Table Expressions

Itzik Ben-Gan continues a series on table expressions:

This article is the eleventh part in a series about table expressions. So far, I’ve covered derived tables and CTEs, and recently started the coverage of views. In Part 9 I compared views to derived tables and CTEs, and in Part 10 I discussed DDL changes and the implications of using SELECT * in the view’s inner query. In this article, I focus on modification considerations.

As you probably know, you’re allowed to modify data in base tables indirectly through named table expressions like views. You can control modification permissions against views. In fact, you can grant users permissions to modify data through views without granting them permissions to modify the underlying tables directly.

You do need to be aware of certain complexities and restrictions that apply to modifications through views. Interestingly, some of the supported modifications can end up with surprising outcomes, especially if the user modifying the data isn’t aware they’re interacting with a view. You can impose further restrictions to modifications through views by using an option called CHECK OPTION, which I’ll cover in this article. As part of the coverage, I’ll describe a curious inconsistency between how the CHECK OPTION in a view and a CHECK constraint in a table handle modifications—specifically ones involving NULLs.

As always, it’s well worth the read.

Comments closed

Making Temporal Tables Yours

Lee Markum continues a series on temporal tables:

In the initial post, we talked about some very basic items to get you up and running with temporal tables. In part 2, I’ll cover a few more things to think about and implement when working with temporal tables.

In the first post, we looked at altering an existing table to be a temporal table. Now we will look at what is involved in making a table be a temporal table from the very beginning.

Click through for Lee’s thoughts on naming, indexing, and data retention.

Comments closed

Dynamic Transparency Changes in Power BI

Sandeep Pawar explains how to allow users to control transparency in Power BI visuals:

As someone who uses Python/R heavily for exploratory data analysis and Power BI for publishing the final data analytics reports, I have always missed the ability to adjust the color transperancy in Power BI. In Power BI you can change the color dynamically and conditionally but there is no native functionality to change the transperancy.

I was working on a project where I wanted to highlight certain clusters in the data to the business user. Sure, I could change the color but it’s very challenging when the data points are concentrated in a small area and they overlap each other. In Python and R you can easily adjust the alpha value in most plots to see the dense area clearly.

Click through for one Power BI solution.

Comments closed