Curated SQL – Page 977 – A Fine Slice Of SQL Server

Tiny executors essentially means one executor per core. Following table depicts the values of our spar-config params with this approach:

Analysis: With only one executor per core, as we discussed above, we’ll not be able to take advantage of running multiple tasks in the same JVM. Also, shared/cached variables like broadcast variables and accumulators will be replicated in each core of the nodes which is 16 times. Also, we are not leaving enough memory overhead for Hadoop/Yarn daemon processes and we are not counting in ApplicationManager. NOT GOOD!

Read on for the full analysis.

Comments closed

Query Folding, Azure DevOps, and Power BI

Published 2019-10-02 by Kevin Feasel

Eugene Meidinger tries to work around a query folding limitation:

Query folding is one of the most powerful tools in Power Query and Power BI. It is the automatic process of pushing down filters and other transformations back to the data source. This can dramatically improve performance for your queries.
Unfortunately, OData is not guaranteed to support query folding. According to the Power BI documentation on incremental refresh.

Click through for Eugene’s alternative solution.

Comments closed

Principal Component Analysis in Python

Published 2019-10-02 by Kevin Feasel

Abhinav Choudhary shows us how to implement Principal Component Analysis in Python:

Principal Component Analysis (PCA) is an unsupervised statistical technique used to examine the interrelation among a set of variables in order to identify the underlying structure of those variables. In simple words, suppose you have 30 features column in a data frame so it will help to reduce the number of features making a new feature which is the combined effect of all the feature of the data frame. It is also known as factor analysis.

PCA is quite useful in practice, though it has the unfortunate side effect of making it harder to interpret which factors are driving your solution.

Comments closed

Floating Point Math and SQL Server

Published 2019-10-02 by Kevin Feasel

Bert Wagner has more fun with math in SQL Server:

Years ago I was writing a query for a stacked bar chart in SSRS. The chart intended to show the percentage breakdown of distinct values in a table. For example, the chart would show that value A made up 30% of the rows, B made up 3%, C made up 12% and so on. Since every row had a value, I was expecting the stacked bar chart percentages to add up to 100%
However, in many instances the charts would come up short; instead of a full 100%, the percentages would only add up to 98% or 99%. What was going on?

Bert dives deeply into the topic and then gives us some practical suggestions on how to deal with it.

Comments closed

REMOVEFILTERS() in DAX

Published 2019-10-02 by Kevin Feasel

Matt Allington takes us through the REMOVEFILTERS() function in DAX:

Recently Microsoft introduced a new function in DAX called REMOVEFILTERS(). This is a very useful and well named function and it does exactly what its name suggests. Its purpose is to act as a table filter parameter inside CALCULATE() as shown in the following example.
Total Sales All Products REMOVEFILTERS() = CALCULATE([Total Sales],REMOVEFILTERS(Products))

Read on to see how this compares to the prior/alternative solution and for more information on REMOVEFILTERS().

Comments closed

Saying Goodbye to Pie Charts in Power BI

Published 2019-10-02 by Kevin Feasel

Patrick LeBlanc eliminates a pernicious source of temptation in Power BI:

Patrick shows you how to get rid of the pie chart in Power BI Desktop. Finally a way to remove the temptation! The ability to customize the visual pane in Power BI Desktop lets you do it.

See you never, pie chart.

Comments closed

The SSIS Error Output

Published 2019-10-02 by Kevin Feasel

Tim Mitchell explains how to use the error output on data flow components in SQL Server Integration Services:

SSIS error outputs are a secondary path through which the data flow can send rows that do not conform to data type, length, or transformation standards defined by the ETL developer. That’s a lengthy way to say that it’s where you can send your junk data. In the SSIS designer, clicking on a source or transformation will often show not one but two possible outputs: the primary output (the “good” data, indicated by the blue line) and the error output (identified by the red line). As shown on the flat file source below, when selecting a source or transformation, those that have an available error output will appear with both output connectors ready for selection.

Tim elaborates quite a bit on what you can do with this output.

Comments closed

Using purrr to Eliminate Looped Function Calls

Published 2019-10-01 by Kevin Feasel

Sebastian Sauer demonstrates using the pmap() function in purrr to call a function multiple times with different parameters:

Assume you have to call a function multiple times, but each with (possibly) different argument. Given enough repitioons, you will not want to repeat yourself.
In other words, we would like to loop over function arguments, each round in the loop giving the respective argument’value(s) to the function.

This is one of the benefits of functional-style programming: loops become higher-order functions, which take less time to write and keeps your code from looking like a pyramid of doom.

Comments closed

KarelDB: SQL on Kafka

Published 2019-10-01 by Kevin Feasel

George Leopold informs us on a new project called KarelDB:

Unlike Confluent’s Kafka-based platform, KarelDB is not a streaming database. Yokota nevertheless flagged the relational database largely because it’s based on open-source components backed by Kafka. Hence, he reckons there’s a chance it could take off.
Those open source components include Calcite, an SQL framework that pushes relational queries to the data store, an approach seen as providing more efficient processing. Yokota noted that KarelDB would “automatically benefit” from upcoming Calcite optimizations.
Other open source projects such as the Apache Flink stream processing engine also have leveraged Calcite, including an SQL API. Calcite also includes an SQL parser.

Kafka already had KSQL for Kafka Streams, but this is a totally different validation of Feasel’s Law.

Comments closed

SQL Assessment API Public Preview 2

Published 2019-10-01 by Kevin Feasel

Ebru Ersan announces public preview 2 of the SQL Assessment API:

SQL Assessment API is a new mechanism to evaluate configuration of your SQL Server for best practices. The API methods are used by means of a SQL Server Management Object (SMO) extension and new cmdlets in SqlServer PowerShell module. API is delivered with a ruleset that is highly customizable and extensible. It can be used to assess SQL Server versions 2012 and higher, both on Windows and Linux.

Read on for instructions on how to install and what has changed since public preview 1.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts