2020-07-01 – Curated SQL

We recently wrote an article debunking common myths about data lake architectures, data lake definitions, and data lake analytics. It is called “What is a Data Lake? Get A Leg Up Avoiding The Biggest Myths.” In that article, we framed the current conversation about data lakes and how they fit within enterprise data strategies. This topic has historically been confusing and opaque for those wanting to get value from a data lake due to conflicting advice from consultants and vendors.
One area that can be particularly confusing is the perception that lakes are only for “big data.” If you spend any time reading materials on lakes, you would think there is only one type and it would look like the Capsian Sea (it’s a lake despite “sea” in the name). People describe data lakes as massive, all-encompassing entities, designed to hold all knowledge. The good news is that lakes are not just for “big data” and you have more opportunities than ever to have them be part of your data stack.

Click through for Thomas’s argument.

Comments closed

The Basics of Autoregressive Models

Published 2020-07-01 by Kevin Feasel

Holger von Jouanne-Diedrich explains some of the principels of autoregressive models through a demonstration:

Well, this seems to be good news for the sales team: rising sales! Yet, how does this model arrive at those numbers? To understand what is going on we will now rebuild the model. Basically, everything is in the name already: auto-regressive, i.e. a (linear) regression on (a delayed copy of) itself (auto from Ancient Greek self)!
So, what we are going to do is create a delayed copy of the time series and run a linear regression on it. We will use the lm() function from base R for that (see also Learning Data Science: Modelling Basics).

Read on for some additional understanding.

Comments closed

Tips for Debugging in Visual Studio

Published 2020-07-01 by Kevin Feasel

Patrick Smacchia has 12 tips for debugging in Visual Studio:

4) Data breakpoint: Break when value changes
If you set a breakpoint to a non-static property setter it will be hit when changing the property value for all objects. The same behavior can be obtained for a single object thanks to the Locals (or Watch) window right click : Break When Value Changes menu.
This facility is illustrated with the animation above. The hit occurs only when obj2.Prop is changed, not when obj1.Prop is changed.

These go a step beyond the basics, so check them out.

Comments closed

PRINTing More Than 8000 Bytes

Published 2020-07-01 by Kevin Feasel

Richard Swinbank hits on a bugbear of mine:

A feature of T-SQL is that strings longer than 8000 bytes are truncated by PRINT. If you haven’t already discovered this, you might wonder why it’s a problem – the answer (for me at least) is dynamic SQL.

Read on for Richard’s answer. There is an easier way to do it with the paid version of SQL# by using the Util_Print function.

Comments closed

Configuring Power BI Incremental Refresh

Published 2020-07-01 by Kevin Feasel

Gilbert Quevauvilliers has a follow-up from a post:

Following on from my successful blog post How you can incrementally refresh any Power BI data source (This example is a CSV File), I found a way where I can just use dates created in Power Query to get data refreshing incrementally.
Full credit goes to Rafael Mendonç who actually figured this out. All that I have done is to translate what Rafael Mendonça did in his PBIX and put it into steps that you can follow along with.
https://www.rafaelmendonca.com/2020/06/incremental-powerbi-csv-api-excel-odbc.html
In this blog post I am going to demonstrate how to get this working with what I hope is very easy to follow.

Read on for the process.

Comments closed

Truncating All Tables in a Database with Powershell

Published 2020-07-01 by Kevin Feasel

Jess Pomfret nukes the database from orbit, as it’s the only way we can be sure:

The most popular post on my blog so far was called ‘Disable all Triggers on a Database’ and this one is a good follow up from that post.
The scenario here is you need to remove all the data from the tables in your database. This could be as part of a refresh process, or perhaps to clear out test data that has been entered through an application. Either way, you want to truncate all the tables in your database.

Click through for the code.

Comments closed

Keep Parameter Sniffing On

Published 2020-07-01 by Kevin Feasel

Brent Ozar explains why you should keep parameter sniffing on:

What they THINK is going to happen is that SQL Server will do an OPTION(RECOMPILE) on every incoming query, building fresh plans each time. That ain’t how this works at all, and instead, I wish this “feature”‘s name was “Parameter Blindfolding.” Here’s what it really does.

Read on for the explanation. In reality, parameter sniffing is almost always a good thing. It’s when you have major skews in data that you even have to think about parameter sniffing being a problem.

Comments closed

Tips for Improving Power BI Dashboards

Published 2020-07-01 by Kevin Feasel

Tino Zishiri has a set of tips to design better-looking dashboards:

There are several reasons why you should design great looking dashboards. Here are a few;
– They make information more accessible – end users benefit from an intuitive design that makes insight easy to obtain so they can make informed decisions.
– They help convey your message – you’re in a better position to tell a coherent story. Applying design principles can also help accentuate your message. My colleague Kalina Ivanova has written an excellent series of blogs on Data Storytelling with Power BI.
– They encourage user adoption – if a report is useful to users and has a great look and feel then you’re winning.
In this blog, I’ll briefly cover the building blocks that make up a good Power BI dashboard. I then explore the stepping stones that will level up your dashboard and take it from good to great.

One area where I do have some disagreement is that the Z and F layouts are fine for text-heavy formats, but generally “text-heavy” and “dashboard” don’t go together very well. My preference is the notion of focal points (go about 3/4 of the way down, to the section entitled “Where We Look”), which works much better at describing eye behavior for image-heavy layouts. That aside, I like this post a lot.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: July 1, 2020

Data Lakes for Smaller Projects

The Basics of Autoregressive Models

Tips for Debugging in Visual Studio

PRINTing More Than 8000 Bytes

Configuring Power BI Incremental Refresh

Truncating All Tables in a Database with Powershell

Keep Parameter Sniffing On

Tips for Improving Power BI Dashboards