2019-06-17 – Curated SQL

When Not to Use Spark

Published 2019-06-17 by Kevin Feasel

Ramandeep Kaur gives us several cases when it makes sense not to use Apache Spark:

There can be use cases where Spark would be the inevitable choice. Spark considered being an excellent tool for use cases like ETL of a large amount of a dataset, analyzing a large set of data files, Machine learning, and data science to a large dataset, connecting BI/Visualization tools, etc.
But its no panacea, right?
Let’s consider the cases where using Spark would be no less than a nightmare.

No tool is perfect at everything. Click through for a few use cases where the Spark experience degrades quickly.

Comments closed

Linear Regression Assumptions

Published 2019-06-17 by Kevin Feasel

Stephanie Glen has a chart which explains the four key assumptions behind when Ordinary Least Squares is the Best Linear Unbiased Estimator:

If any of the main assumptions of linear regression are violated, any results or forecasts that you glean from your data will be extremely biased, inefficient or misleading. Navigating all of the different assumptions and recommendations to identify the assumption can be overwhelming (for example, normality has more than half a dozen options for testing).

Violating one of the assumptions isn’t the end of the world, though it can make understanding the model and generating accurate predictions harder.

Comments closed

T-SQL Bugs with Joins

Published 2019-06-17 by Kevin Feasel

Itzik Ben-Gan takes us through four bugs or oddities around joins:

The order counts are now correct, but the total freight values are not. Can you spot the new bug?
The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to have the exact same freight values. In such a case, you are now taking the freight into account only once per customer, and not once per order as you should.

Click through to avoid accidentally introducing bugs in your T-SQL code.

Comments closed

Visualizing with Heatmaps in R

Published 2019-06-17 by Kevin Feasel

Anisa Dhana shows how you can create a quick heatmap plot in R:

To give your own colors use the scale_fill_gradientn function.
ggplot(dat, aes(Age, Race)) +
geom_raster(aes(fill = BMI)) +
scale_fill_gradientn(colours=c("white", "red"))

This is a quick example using ggplot2 but there are other heatmap libraries available too.

Comments closed

Usability Issues with Query Store

Published 2019-06-17 by Kevin Feasel

Lonny Niederstadt has a new series on usability “soft spots” with Query Store. Part one looks at plan identifiers:

Yeah. That’s a lotta plans in the right-hand legend. 22 of them. In a not very helpful order. In fact… though I’ve tried to figure it out, I don’t know what type of order is used for that right-hand legend. It’s not chronological. It’s not based on duration which is the metric displayed by the graph. I dunno.

Part two looks at “morally equivalent plans”:

Let’s refresh the “Tracked Queries” activity.

Ohhhh. I forced plan_id 2 (in the purple box below) but what showed up was plan_id 3220 (in the yellow box below).

Lonny promises more, so keep on the lookout.

Comments closed

Using Biml to Retrieve Query Metadata

Published 2019-06-17 by Kevin Feasel

David Stein shows how you can use the GetQuerySchema() function in Biml to retrieve metadata for a query:

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so.
The GetQuerySchema method provides an excellent way to import metadata from a query, in a similar way as GetDatabaseSchema imports schemas, tables, and views. GetQuerySchema was introduced last year with the 2018 editions of BimlStudio and BimlExpress.

This kind of higher-level programming takes some getting used to, but once you understand it, you can find common data access patterns and build one solution to work through many versions of the pattern.

Comments closed

What Compatibility Level 150 Gets You

Published 2019-06-17 by Kevin Feasel

Erik Darling explains the upsides and downsides of moving to SQL Server 2019 and compatibility level 150:

In those versions, flipping compatibility level uses the new Cardinality Estimator (CE). That new Cardinality Estimator is real hit or miss.
The worst part is that there’s practically no gain to be realized for using higher compatibility levels — that changes with SQL Server 2019.

Read on to see what those new features are. As far as the compatibility level switch goes, there comes a time when you just need to bite the bullet and use the new cardinality estimator. Erik has a few tips to help with that too.

Comments closed

Create Reports in Power BI Desktop Instead of Service

Published 2019-06-17 by Kevin Feasel

Melissa Coates explains why you should create reports in Power BI Desktop rather than directly through the Power BI Service:

I always recommend to Power BI authors that report creation & editing should happen in Power BI Desktop and to just ignore the edit capability in the Power BI Service. Usually my reasons are concerned with (a) which version is the latest version and (b) having versioning history, and (c) reducing the risk of someone overwriting someone else’s work because two different reporting editing/publishing options are in use. These are very valid concerns, but I’ve discovered one additional problem that could easily confuse some people: (d) the side effects experienced if you download then re-upload a web-created report. Let me explain…

Read on for that explanation.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Day: June 17, 2019