Curated SQL – Page 1156 – A Fine Slice Of SQL Server

Building Custom R Visuals In Power BI

Published 2018-10-30 by Kevin Feasel

Brad Lewellyn shows us how to create custom R visuals within Power BI:

Over the last few posts, we’ve shown how to use custom R visuals built by others. Today, we’re going to build our own using the Custom R Visual available in Power BI Desktop. If you haven’t read the second post in this series, Getting Started with R Scripts, it is highly recommended you do so now, as it provides necessary context for how to link Power BI to your local R ISE.

In the previous post, we created a bunch of log-transformed measures to find good predictors for Revenue. We’re going to use these same measures today to create a basic linear regression model to predict Revenue. If you want to follow along, the dataset can be found here. Here’s the custom DAX we used to create the necessary measures.

Click through for the example.

Comments closed

Cannot Rollback TRUNCATE In Redshift

Published 2018-10-30 by Kevin Feasel

Derik Hammer notes that you cannot rollback a TRUNCATE TABLE operation in Redshift:

In SQL Server, or PostgreSQL for that matter, the TRUNCATE command is allowed in a transaction and it will commit or rollback like any other DML operation. In all of the scripts, below, I will do the following.

Check my row counts
Begin a transaction
TRUNCATE the table
INSERT one row
Check my row counts from within the transaction
ROLLBACK
Check to see if my row checks revert or not

That’s an important behavioral difference when loading data using a truncate-and-reload pattern.

Comments closed

How SQL Server Implements Index Spooling

Published 2018-10-30 by Kevin Feasel

Hugo Kornelis has a long article which dives into the way SQL Server handles index spooling:

A Table Spool operator stores its data in a worktable that is structured as a clustered index. The index is not built on any of the columns in the data, nor on any artificially added columns. It is structured on zero columns. As is normal for a clustered index on a set of columns that is not unique for the set, a 4-byte uniqueifier is then added to the data to give each row a unique internal address.

The worktable for an Index Spool operator is also structured as a clustered index. However, this operator does actually index actual columns from its data instead of just relying on a uniqueifier. The indexed columns are chosen to effectively satisfy the Seek Predicate property. The statement in the Microsoft’s documentation that a nonclustered index is used for Index Seek is not correct.

A stack spool is represented in execution plans as a combination of an Index Spool and a Table Spool, both with the With Stack property present and set to True. This is misleading because it is actually a different type of spool. The worktable it uses is built as a clustered index on a single column, representing the nesting level. Because this is not unique, a uniqueifier is added where needed.

This is a deep look at some operators which people tend to gloss over but can have huge performance impacts.

Comments closed

Taking Advantage Of Vectorization In R

Published 2018-10-29 by Kevin Feasel

John Mount explains, using Conway’s Game of Life, the importance of using vectors in R over scalars:

R is an interpreted programming language with vectorized data structures. This means a single R command can ask for very many arithmetic operations to be performed. This also means R computation can be fast. We will show an example of this using Conway’s Game of Life.

Conway’s Game of Life is one of the most interesting examples of cellular automata. It is traditionally simulated on a rectangular grid (like a chessboard) and each cell is considered either live or dead. The rules of evolution are simple: the next life grid is computed as follows:

To compute the state of a cell on the next grid sum the number of live cells in the eight neighboring cells on the current grid.
If this sum is 3 or if the current cell is live and the sum is 2 or 3, then the cell in the next grid will be live.

Not only is the R code faster, but it’s also terser.

Comments closed

In Lieu Of Lambda Architecture, Using Faster Databases

Published 2018-10-29 by Kevin Feasel

Justin Langseth argues that the Lambda architecture is not really necessary if you are using the right data stores:

Basically, the idea is to keep the fast stuff fast and the slow stuff slow. I wrote a paper 14 years ago on the challenges of real-time data warehousing. Fortunately, both the data streaming, database, and BI layers have all evolved significantly since then, and now there exist databases and other data storage engines which can support the feature trinity that is needed to do both real-time and historical analytics right, without a Lambda architecture:

Accept real-time streams of data at high rates.

Simultaneously respond to large volumes of queries, including on the most recently added data.

Store all the history needed for analysis.

We call these engines “fast data sinks” and there are four main groups of them today:

It’s an interesting argument.

Comments closed

Automating E-mail Of Windows Event Log Alerts Via Powershell

Published 2018-10-29 by Kevin Feasel

Dave Bermingham shows us how to automatically fire off e-mails for specific Windows Event Log event IDs:

The first thing that you need to do is write a Powershell script that when run can send an email. While researching this I discovered many ways to accomplish this task, so what I’m about to show you is just one way, but feel free to experiment and use what is right for your environment.

In my lab I do not run my own SMTP server, so I had to write a script that could leverage my Gmail account. You will see in my Powershell script the password to the email account that authenticates to the SMTP server is in plain text. If you are concerned that someone may have access to your script and discover your password then you will want to encrypt your credentials. Gmail requires and SSL connection so your password should be safe on the wire, just like any other email client.

Here is an example of a Powershell script that when used in conjunction with Task Scheduler which will send an email alert automatically when any specified Event is logged in the Windows Event Log.

Read on for the script as well as some bonus troubleshooting.

Comments closed

Missing Foreign Keys—A Cultural Problem

Published 2018-10-29 by Kevin Feasel

Martin Catherall tells a spooky Halloween story:

By large databases I’m roughly meaning databases with several hundred tables, and I usually see a lot of these tables with several hundred GB’s of data in them.

When I generally ask about the reason for no foreign key, I’m told

they add overhead

they give no benefit

we can’t enter our data properly when we have them

The last one in the above list is generally down to poor modelling – an example being entering a later part of a financial transaction before the first part of the financial transaction has taken place. Once both parts of the financial transaction have taken place then the database is left in a consistent state – BUT, that generally being handled by the app NOT the database – OUCH!

There are times where key constraints are too much—often-updating fact tables might be one such scenario. And some of “too much” comes down to hardware quality. But for the most part, those key constraints are one of the clearest forms of database documentation available, not to mention their normal benefits.

Comments closed

Valid WAITFOR Data Types

Published 2018-10-29 by Kevin Feasel

Dave Mason investigates the valid data types you can use as inputs for WAITFOR:

There are certain design patterns in T-SQL that give me pause. They may not be “code smells” per se, but when I encounter them I find myself thinking “there’s got to be a more sensible way to accomplish this”. WAITFOR DELAY is one example. I’ve used it a few times here and there, mostly in one-off scripts. If I wrote some code that used it regularly in production, I’d be paranoid about putting my SPID to sleep forever. Maybe a little bit of paranoia is a good thing. But I digress.

A recent task found its way to me, and I’ve decided to use WAITFOR DELAY as part of my solution. (It hasn’t been tested or implemented yet–perhaps more on this in another post.) My usage this time has been more complex than in the past. What I already knew is that you can use a string literal for the time_to_pass argument. For example, this will delay for 3½ seconds:

WAITFOR DELAY '00:00:03.500'

Click through for a bunch of testing.

Comments closed

Uncovering Complexity In SQL Objects

Published 2018-10-29 by Kevin Feasel

Michael J. Swart helps us uncover hidden complexity in database objects:

The other day, Erin Stellato asked a question on twitter about the value of nested SPs. Here’s how I weighed in:

I’m not a fan of nested anything. Too much hidden complexity. Code reusability leads to queries that are jack of all trades, master of none.
“Don’t repeat yourself” doesn’t work as well in SQL as it does in other code.

— Michael J Swart (@MJSwart) October 19, 2018

Hidden complexity has given me many problems in the past. SQL Server really really likes things simple and so it’s nice to be able to uncover that complexity. Andy Yun has tackled this problem for nested views with his sp_helpexpandview.

Click through for a script which helps.

Comments closed

Using Spring Boot To Build A NiFi Operational Dashboard

Published 2018-10-26 by Kevin Feasel

Tim Spann continues his series on building an Apache NiFi operational dashboard:

To access data to display in our dashboard we will use some Spring Boot 2.06 Java 8 microservices to call Apache Hive 3.1.0 tables in HDP 3.0 on Hadoop 3.1.

We will have our website hosted and make REST Calls to Apache NiFi, our microservices, YARN, and other APIs.

As you can see we can easily incorporate data from HDP 3 — Apache Hive 3.1.0 in Spring Boot Java applications with not much trouble. You can see the Maven build script (all code is in GitHub).

Our motivation is to put all this data somewhere and show it on a dashboard that can use REST APIs for data access and updates. We may choose to use Apache NiFi for all REST APIs or we can do some in Apache NiFi. We are still exploring. We can also decide to change the backend to HBase 2.0, Phoenix, Druid or a combination of these. We will see.

Read on for a series of screenshots and config files showing you how to set this up.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts