Kevin Feasel – Page 652

Database Code Reviews: a Process

Published 2022-03-11 by Kevin Feasel

I’ll be honest, ever since I did a SQL Homework about doing code reviews I’ve wanted to do a blog post about them. Recently Emily Krager (TikTok | Twitter) did a TikTok about code review suggestions which seemed like a good excuse for me to do this. If you don’t follow her I recommend it, she does a great job of combining humor and technology and is just a lot of fun to listen to. Here is her list as best I was able to transcribe it.

Click through for Kenneth’s thoughts on the topic.

Comments closed

Queries and Batch Mode

Published 2022-03-11 by Kevin Feasel

Erik Darling takes us on a batch mode joyride:

Prior to SQL Server 2019, you needed to have a columnstore index present somewhere for batch mode to kick in for a query.
Somewhere is, of course, pretty loose. Just having one on a table used in a query is often enough, even if a different index from the table is ultimately used.

Batch mode is pretty great and Erik explains why.

Comments closed

Column References in DAX

Published 2022-03-11 by Kevin Feasel

Teo Lachev makes a reference:

Suppose you use a DAX table variable, such as to group by certain columns and add an extension column as a calculation. Then, you want to count the rows in the table by filtering on one of the columns. At your first attempt, you might try using CALCULATE.

That doesn’t work and Teo explains why, as well as what you do need to use.

Comments closed

Snowflake Purchases Streamlit

Published 2022-03-10 by Kevin Feasel

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.
Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed

Map and FlatMap in Spark

Published 2022-03-10 by Kevin Feasel

The Hadoop in Real World team maps some knowledge:

Let’s look at map() first. map() transforms and RDD with N elements to RDD with N elements. Important thing to note is each element is transformed into another element there by the resultant RDD will have the same elements as before.

Click through to see how map() and flatMap() differ.

Comments closed

Power BI Decomposition Trees

Published 2022-03-10 by Kevin Feasel

Gauri Mahajan shows off decomposition trees in Power BI:

A large volume and variety of data generally need data profiling to understand the nature of data. One of the aspects of data is hierarchy and inter-relationships within different attributes in data. Hierarchical data is often nested at multiple levels. To analyze the relationship between different attributes in a data that is hierarchical, drill-down and drill-through are two of the most common techniques that are employed for data exploration as well as use-cases like root cause analysis. While these techniques are standard and have been in the industry for quite a long time, figuring out these relationships and navigating hierarchical data can be a challenging task. Data Analysts or Business Analysts typically perform this analysis on the data before presenting it to the end-users. In certain cases, some domain or business users may be required to perform such analysis on the report itself. In that case, the task becomes even more challenging considering the limited data analysis capabilities offered by a reporting tool compared to a database and query languages like SQL. To help power users perform such analysis on a reporting tool, visualizations like decomposition trees can be used to decompose hierarchical data that is presented in an aggregated manner. The Decomposition tree can support both drill-down as well as drill-through use-cases when the user is provided the flexibility to choose the hierarchy or dimensions on-demand. In the Microsoft technology stack, Power BI is the key reporting tool for authoring reports and supports a wide variety of data sources. Power BI offers a category of visuals which are known as AI visuals. One such visual in this category is the Decomposition Tree.

Read on to see how you can create a decomposition tree, what kind of information it shows, and how you can interact with it to learn more about correlations and causes.

Comments closed

Azure Data Studio Execution Plans

Published 2022-03-10 by Kevin Feasel

Hugo Kornelis is happy (for now):

But I am not writing this post to moan about past issues. I am writing this post because Microsoft has made huge improvements to execution plan support in ADS. These are officially still in preview, but they are already available. However, you will need to take a few steps to see these improvements in action.

Read on to see what you need to do and to get Hugo’s initial thoughts.

Comments closed

SQLBits Keynote Notes

Published 2022-03-10 by Kevin Feasel

Brent Ozar shares some thoughts from the first day’s keynote from SQLBits:

Pedro Lopes took the stage to talk about parameter-sensitive plan optimization, aka PSP Optimization. He demoed it with SQL Server 2022 CTP 1.3. I’ve written about this feature before, and there wasn’t anything new here in the demos. My opinion on this feature remains the same: I think it sounds like a phenomenal down payment. It won’t fix parameter sniffing, but I don’t think it’s going to backfire.

Read on for Brent’s thoughts around what Microsoft is doing for SQL Server 2022.

Comments closed

Query Performance Insights on the Serverless SQL Pool

Published 2022-03-10 by Kevin Feasel

Jovan Popovic shows how you can use the QPI library on an Azure Synapse Analytics serverless SQL pool:

You can find more of the best practices here. These best practices are very important because some issues might cause performance degradation. You might be surprised how applying some of these best practices might improve the performance of your workload.
The last item that is related to schema optimization is sometimes hard to check. You would need to look at your schema, inspect all columns and find what to optimize. If you have a large schema, this might not be an easy task. But you can make your life easier if you use the QPI helper library that can detect schema issues for you.

Read on to see what it can find.

Comments closed

Apply Functions in R

Published 2022-03-09 by Kevin Feasel

Selina Cheng explains how the various apply() functions work:

Today I’m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., sum(), mean()) across a vector, list, matrix, or data frame. For those of you familiar with ‘for’ loops, the apply() family often allows you to avoid constructing those and instead wrap the loop into one simple function.
I’m going to discuss the functions apply(), lapply(), sapply(), and tapply() in this blog post (as well as using the dplyr library for similar tasks). These functions all end in apply() because you apply the function you want across all the specified elements.

Read on to see how these functions work. H/T R-Bloggers.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel