Press "Enter" to skip to content

Curated SQL Posts

Column References in DAX

Teo Lachev makes a reference:

Suppose you use a DAX table variable, such as to group by certain columns and add an extension column as a calculation. Then, you want to count the rows in the table by filtering on one of the columns. At your first attempt, you might try using CALCULATE.

That doesn’t work and Teo explains why, as well as what you do need to use.

Comments closed

Queries and Batch Mode

Erik Darling takes us on a batch mode joyride:

Prior to SQL Server 2019, you needed to have a columnstore index present somewhere for batch mode to kick in for a query.

Somewhere is, of course, pretty loose. Just having one on a table used in a query is often enough, even if a different index from the table is ultimately used.

Batch mode is pretty great and Erik explains why.

Comments closed

Snowflake Purchases Streamlit

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.

Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed

Azure Data Studio Execution Plans

Hugo Kornelis is happy (for now):

But I am not writing this post to moan about past issues. I am writing this post because Microsoft has made huge improvements to execution plan support in ADS. These are officially still in preview, but they are already available. However, you will need to take a few steps to see these improvements in action.

Read on to see what you need to do and to get Hugo’s initial thoughts.

Comments closed

Power BI Decomposition Trees

Gauri Mahajan shows off decomposition trees in Power BI:

A large volume and variety of data generally need data profiling to understand the nature of data. One of the aspects of data is hierarchy and inter-relationships within different attributes in data. Hierarchical data is often nested at multiple levels. To analyze the relationship between different attributes in a data that is hierarchical, drill-down and drill-through are two of the most common techniques that are employed for data exploration as well as use-cases like root cause analysis. While these techniques are standard and have been in the industry for quite a long time, figuring out these relationships and navigating hierarchical data can be a challenging task. Data Analysts or Business Analysts typically perform this analysis on the data before presenting it to the end-users. In certain cases, some domain or business users may be required to perform such analysis on the report itself. In that case, the task becomes even more challenging considering the limited data analysis capabilities offered by a reporting tool compared to a database and query languages like SQL. To help power users perform such analysis on a reporting tool, visualizations like decomposition trees can be used to decompose hierarchical data that is presented in an aggregated manner. The Decomposition tree can support both drill-down as well as drill-through use-cases when the user is provided the flexibility to choose the hierarchy or dimensions on-demand. In the Microsoft technology stack, Power BI is the key reporting tool for authoring reports and supports a wide variety of data sources. Power BI offers a category of visuals which are known as AI visuals. One such visual in this category is the Decomposition Tree.

Read on to see how you can create a decomposition tree, what kind of information it shows, and how you can interact with it to learn more about correlations and causes.

Comments closed

Query Performance Insights on the Serverless SQL Pool

Jovan Popovic shows how you can use the QPI library on an Azure Synapse Analytics serverless SQL pool:

You can find more of the best practices here. These best practices are very important because some issues might cause performance degradation. You might be surprised how applying some of these best practices might improve the performance of your workload.

The last item that is related to schema optimization is sometimes hard to check. You would need to look at your schema, inspect all columns and find what to optimize. If you have a large schema, this might not be an easy task. But you can make your life easier if you use the QPI helper library that can detect schema issues for you.

Read on to see what it can find.

Comments closed

SQLBits Keynote Notes

Brent Ozar shares some thoughts from the first day’s keynote from SQLBits:

Pedro Lopes took the stage to talk about parameter-sensitive plan optimization, aka PSP Optimization. He demoed it with SQL Server 2022 CTP 1.3. I’ve written about this feature before, and there wasn’t anything new here in the demos. My opinion on this feature remains the same: I think it sounds like a phenomenal down payment. It won’t fix parameter sniffing, but I don’t think it’s going to backfire.

Read on for Brent’s thoughts around what Microsoft is doing for SQL Server 2022.

Comments closed

Trials and Tribulations of Maintaining CRAN Packages

John Mount explains the downside cost of CRAN being so useful:

If this automated email from a bulk sender bounces, goes to SPAM, or isn’t responded to quickly: your package will be archived or removed from CRAN. We’ve received these emails, and always acted on them quickly, out of fear.

The referred to check results are often not reproducible. For example, our most recent scare (that hasn’t yet triggered the email, and we have submitted a work-around before complaining here) was just “SUMMARY: processing the following file failed”, without details beyond the name of the failing file.

This is a tricky problem. On the one hand, as an end user of packages, I want packages playing nicely with each other. This is a lot better than Pip’s “Oh, sorry, you need version X but to install version X, it’ll break package Y as it needs < X” nightmare.

On the other hand, as a maintainer of a package, there’s a lot of added effort on a tight timeline for what is usually a volunteer effort.

I don’t have any CRAN packages I maintain and so I tend to be on the beneficiary side of things. But it’s important to keep those package maintainers in mind and one of the easiest ways to do that is to make explicit, reproducable bug reports. It may not make the deadlines more lax but at least that makes maintainers’ lives easier.

Comments closed

Apply Functions in R

Selina Cheng explains how the various apply() functions work:

Today I’m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., sum()mean()) across a vector, list, matrix, or data frame. For those of you familiar with ‘for’ loops, the apply() family often allows you to avoid constructing those and instead wrap the loop into one simple function.

I’m going to discuss the functions apply()lapply()sapply(), and tapply() in this blog post (as well as using the dplyr library for similar tasks). These functions all end in apply() because you apply the function you want across all the specified elements.

Read on to see how these functions work. H/T R-Bloggers.

Comments closed