Press "Enter" to skip to content

Author: Kevin Feasel

Adaptive Query Execution in Databricks

MaryAnn Xue and Allison Wang explain how Adaptive Query Execution works with Databricks:

One of the most important cost-based decisions made in the Spark optimizer is the selection of join strategies, which is based on the size estimation of the join relations. But since this estimation can go wrong in both directions, it can either result in a less efficient join strategy because of overestimation, or even worse, out-of-memory errors because of underestimation.

AQE offers a trouble-free solution here by switching to the faster broadcast hash join during execution time.

This is pretty similar to Adaptive Query Processing in SQL Server.

Comments closed

The Sequence Project Operator

Hugo Kornelis continues on a quest:

The Sequence Project operator computes values for the “ranking functions”:  functions where the results depend on other rows in the result set, such as ROW_NUMBERRANKDENSE_RANK, and NTILE.

A Sequence Project can be considered as somewhat similar in function as Compute Scalar: both operators add new columns to the data based on expression. But Compute Scalar works on expressions other columns from the same row and constant values as input. Sequence Project computes expressions that are based on preceding rows in the data stream as their input.

Read on to learn more about what this operator does and how it works.

Comments closed

SQL Server Replication Requires Actual Server Names

Steve Stedman walks us through a pain point when using replication:

SQL Server replication requires the actual server name to make a connection to the server. Specify the actual server name. (Replication.Utilities).

You might be thinking to yourself that you had a typo in the server name, but no, after checking the server name it matches what you can connect with.

When I’ve seen this error, often it will even tell me the server name it’s expecting, which then makes me ask why I have to type it in if it knows already.

Comments closed

SolarWinds Acquires SentryOne

Greg Gonzalez announces the news:

SolarWinds recently announced the intention to acquire SentryOne, an event that promises to make life even better for the IT and data professionals who use our products. Both companies have historically focused—in their own ways—on solving real-life problems for customers by providing high-quality solutions.

I do wonder what will happen given that SolarWinds already has a database monitoring tool; will they keep the two separate, combine them together, or do something else?

Comments closed

Memory-Optimized Table Types to Avoid tempdb Contention

Michael J. Swart uses In-Memory OLTP:

At D2L, we’re the perfect candidate customer for In Memory OLTP features, but we’ve held off adopting those features for years. Our servers handle tons of super quick but super frequent queries and so we find ourselves trying to address the same scaling challenges we read about in Microsoft’s customer case studies.

But there’s only one In Memory feature in particular that I care about. It’s the Memory Optimized Table Types. Specifically, I’ve always wanted to use that feature to avoid tempdb object allocation contention. Recently I finally got my chance with a lot of success. So even though I could say I’m happy with In Memory features, I think it’s more accurate to say that I feel relieved at having finally squashed my tempdb issues.

We’ve used memory-optimized table types for a couple of years to solve exactly this problem and the plan was pretty much the same as what Michael put into action.

Comments closed

Thoughts on Trendlines

Alex Velez shares some thoughts on trendlines:

A trendline is a line drawn on a chart highlighting an underlying pattern of individual values. The line itself can take on many forms depending on the shape of the data: straight, curved, etc. This is common practice when using statistical techniques to understand and forecast data (e.g. regression analysis). Determining the best fit and forecasting is beyond this article’s scope, so if you’re interested in learning more, I recommend Anna Foard’s Stats Ninja website. Instead, I’ll focus on various considerations related to visualizing trendlines when communicating data.

My main thought on trendlines is that they are less important than the data points. We make up the trendlines out of thin air; the data points actually exist and actually matter. Trendlines can be useful, but they don’t replace the data.

Comments closed

Getting Started with Azure Synapse Analytics

John Macintyre shares some Azure Synapse Analytics samples:

To further accelerate time to insight in Microsoft Azure Synapse Analytics, we are introducing the Knowledge center to simplify access to pre-loaded sample data and to streamline the getting started process for data professionals. You can now create or use existing Spark and SQL pools, connect to and query Azure Open Datasets, load sample scripts and notebooks, access pipeline templates, and tour the Azure Synapse Studio—all from one place.

Click through for details on the samples.

Comments closed

A New Financial Sample Dataset with Power BI

David Eldersveld takes a look at a new sample data set in Power BI Desktop:

With the October 2020 release of Power BI Desktop, Microsoft has incorporated a sample dataset directly into the product. While there are a few commonly used training and demo datasets already, there are benefits to having a basic starter Financials dataset included.

This data can accelerate the learning experience for new users, particularly those who may only need to focus on authoring reports on existing datasets at their organization and don’t need to immediately know much about data modeling.

Click through for David’s mixed review.

Comments closed