Author: Kevin Feasel

API Consumption with Power BI Dataflows

Published 2020-08-06 by Kevin Feasel

Matthew Roche covers some pointers when working with APIs:

These APIs are often slower than a database, which can increase load/refresh times. Sometimes the load time is so great that a refresh may not fit within the minimum window based on an application’s functional requirements.
These APIs may also be throttled. SaaS application vendors often have a billing model that doesn’t directly support frequent bulk operations, so to avoid customer behaviors that affect their COGS and their bottom line, their APIs may be limited to a certain number of calls for a given period.
The bottom line is that when you’re using APIs as a data source in Power BI, you need to take the APIs’ limitations into consideration, and often dataflows can help deliver a solution that accommodates those limitations while delivering the functionality your application needs.

Read on for more details including links to a few pointers.

Comments closed

Reducing Measure Counts with Calculation Groups

Published 2020-08-06 by Kevin Feasel

Patrick LeBlanc has a video explaining calculation groups in Power BI:

Sometimes you end up with a LOT of measures that are related. You can reduce the number of measures with Calculation Groups in Power BI and clean up your field list!

Click through for the video.

Comments closed

ksqlDB 0.11.0

Published 2020-08-05 by Kevin Feasel

Victoria Xia announces ksqlDB 0.11.0:

ksqlDB 0.11.0 contains improvements and fixes spanning stranded transient queries, overly aggressive schema compatibility checks, confusing behavior around casting nulls, bad schema management, and more. Here, we highlight a couple of additional, notable improvements.

Also on my backlog was Andy Coates, talking about key columns in ksqlDB:

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately end with ksqlDB supporting multiple key columns and multiple key formats, including Avro, JSON, and Protobuf.
Before looking at the syntax changes in version 0.10, let’s first look at what is meant by keys in ksqlDB, the two types of key columns, and how this may differ from other SQL systems.

Read on, as it’s an interesting look at how different data architectures can mean radically different recommendations for key design.

Comments closed

When Date Tables Go Bad

Published 2020-08-05 by Kevin Feasel

Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query perform quite a bit worse:

So why did the date table not perform as well as the old-school way?
SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

My reaction was pretty much the same as Koen Verbeeck’s in the comments. Put in clearer terms, calendar tables work best when you’re joining a DATE type to a DATE type. Once you introduce times into the mix, the optimizer has to behave differently, not least because you have to do things like CAST() to coerce data types.

Comments closed

Reviewing the Windows Event Log with Powershell

Published 2020-08-05 by Kevin Feasel

Jess Pomfret takes us through an improvement to the old Get-EventLog cmdlet:

Recently I was tasked with troubleshooting an incident on a SQL Server at a certain point in the past, the issue being a high CPU alert. It’s hard (without monitoring solutions set up) to go back in time and determine what the issue is. However, one thing we can check is the windows event log to see if there was anything happening on the server at that time.
Now, you probably know that my favourite tool of choice is PowerShell, so let’s take a look at how we can use Get-WinEvent to see what was happening in the past.
Get-WinEvent is the newer revamped version of Get-EventLog, and there are two improvements I believe are worth mentioning. Firstly, with the introduction of filter parameters we can now find certain events much easier, which we’ll talk about a little later. Secondly, the performance of Get-WinEvent is much faster than using the legacy command. I believe this is due to the filtering happening at the event engine instead of within PowerShell.

Those are some rather welcome improvements.

Comments closed

Benefits from Nonclustered Columnstore Indexes

Published 2020-08-05 by Kevin Feasel

Dave Mason shows off some places where non-clustered columnstore indexes can benefit you:

I tend to work mostly with OLTP environments. Many of them have questionable designs or serve reporting workloads. Not surprisingly, there are a lot of performance-sapping table scans and index scans. I’ve compensated for this somewhat by using row and page compression, which became available on all editions of SQL Server starting with SQL Server 2016 SP1. Could I get even better results with columnstore indexes? Lets look at one example.
Here are four individual query statements from a stored procedure used to get data for a dashboard. If you add up percentages for Estimated Cost (CPU + IO), Estimated CPU Cost, or Estimated IO Cost, you get a total of about 90% (give or take a few percent).

Read on for the queries and to see how adding a non-clustered columnstore index helped in Dave’s case. I haven’t had a great deal of success with non-clustered columnstore indexes, but have greatly enjoyed the use of clustered columnstore indexes for fact tables.

Comments closed

Calling Stored Procedures with Output Parameters from Entity Framework Core

Published 2020-08-05 by Kevin Feasel

Erik Ejlskov Jensen shares how to call a stored procedure which includes an output parameter from Entity Framework Core:

In this post I will show how you can call stored procedures with OUTPUT parameters from EF Core. I am using the Northwind database for the sample code.

Erik’s code doesn’t have this problem, but using FromSqlRaw can lead to SQL injection problems if you aren’t careful with sanitizing inputs.

Comments closed

Finding Running SQL Agent Jobs

Published 2020-08-05 by Kevin Feasel

Garry Bargsley has a quick dbatools script to find running SQL Agent jobs:

Do you run SQL Agent Jobs in your environment? Do you know what is running at any given time? Most people do not know what is currently running if you ask them.
There are several times I need to know what job(s) is running on which server. For instance, let’s say on the morning of monthly maintenance you want to check to make sure all your important ETL jobs are complete or that there are currently no database backup jobs running.

Read on to see the cmdlet and how you can use it to query across a broad set of servers.

Comments closed

Spark 3.0’s Structured Streaming UI

Published 2020-08-04 by Kevin Feasel

Genamo Yu, et al, show off the Structured Streaming user interface built into Apache Spark 3.0:

When a developer submits a streaming SQL query, it will be listed in the Structured Streaming tab, which includes both active streaming queries and completed streaming queries. Some basic information for streaming queries will be listed in the result table, including query name, status, ID, run ID, submitted time, query duration, last batch ID as well as the aggregate information, like average input rate and average process rate. There are three types of streaming query status, i.e., RUNNING, FINISHED and FAILED. All FINISHED and FAILED queries are listed in the completed streaming query table. The Error column shows the exception details of a failed query.

Read on to learn more.

Comments closed

Managing Lakehouse Data

Published 2020-08-04 by Kevin Feasel

Harsha Gummadavelli gives us an introduction to the Data Lakehouse concept:

“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes. Once you load the data into a data lake, there is no need to load the data into a warehouse for additional analysis or business intelligence. You can directly query the data residing in cheaper but highly reliable storage, often termed as “Object Stores”, thus reducing the operational overhead on data pipelines.

I will say that I’m not particularly sold on the data lakehouse concept at this point. It’s interesting, in that it reduces the number of systems to maintain by one, but I do wonder about performance issues when trying to replace an existing warehouse. The post turns into a marketing pitch for Informatica, but the first half does give a fair introduction to the concept.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31