Month: October 2018

Querying Web API From Power BI

Published 2018-10-23 by Kevin Feasel

Paul Turley shows us how to hit secured Web API endpoints with Power BI:

Having recently worked-through numerous issues with API data feeds and deployed report configurations, I’ve learned a few important best practices and caveats – at least for some common use cases. In one example, we have a client who expose their software-as-a-service (SaaS) customer data through several web API endpoints. Each SaaS customer has a unique security key which they can use with Power BI, Power Query or Excel and other tools to create reporting solutions. If we need a list of available products, it is a simple matter to create a long URL string consisting of the web address for the endpoint, security key and other parameters; an then just pass this to Power Query as a web data source. However, it’s not quite that easy for non-trivial reporting scenarios.

Thanks to Jamie Mikami from CSG Pro for helping me with the Azure function code for demonstrating this with demo data. Thanks also to Chris Webb who has meticulously covered several facets of API data sources in great detail on his blog, making this process much easier.

Click through for the instructions.

Comments closed

Understanding ANY And ALL In SQL

Published 2018-10-23 by Kevin Feasel

Doug Kline explains the ANY and ALL operators in SQL:

-- note that this creates a single column of values
-- which could be used in something like IN
-- for example
SELECT   1
WHERE    12 IN    (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

-- I could rephrase this as:
SELECT   1
WHERE    12 = ANY (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

I rarely see these operators in the wild and might have used them in production code a couple of times if that.

Comments closed

Big Data Clusters In SQL Server 2019

Published 2018-10-23 by Kevin Feasel

James Serra lays out some of the architecture behind SQL Server 2019 Big Data Clusters:

While extract, transform, load (ETL) has its use cases, an alternative to ETL is data virtualization, which integrates data from disparate sources, locations, and formats, without replicating or moving the data, to create a single “virtual” data layer. The virtual data layer allows users to query data from many sources through a single, unified interface. Access to sensitive data sets can be controlled from a single location. The delays inherent to ETL need not apply; data can always be up to date. Storage costs and data governance complexity are minimized. See the pro’s and con’s of data virtualization via Data Virtualization vs Data Warehouse and Data Virtualization vs. Data Movement.

SQL Server 2019 big data clusters with enhancements to PolyBase act as a virtual data layer to integrate structured and unstructured data from across the entire data estate (SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Azure Cosmos DB, MySQL, PostgreSQL, MongoDB, Oracle, Teradata, HDFS, Blob Storage, Azure Data Lake Store) using familiar programming frameworks and data analysis tools:

James covers some of the reasoning behind this and the shift from using Polybase to integrate data with Hadoop + Azure Blob Storage to using SQL Server as a data virtualization engine.

Comments closed

Finding Databases With Multiple Data Or Log Files

Published 2018-10-23 by Kevin Feasel

Lori Brown has a couple of quick scripts to help find databases made up of several data or log files:

This might be kind of basic but since I am working on a comprehensive script to discover things that a DBA really needs to know about, I made a couple of queries that will produce a list of the databases that have multiple files along with the locations of the physical files. One query finds multiple database files (mdf’s) and the other looks for multiple transaction log files (ldf’s). This will also find the Filestream file locations. Since I often have to take on instances without ever having seen them, it is good to know about little things like this.

This script might be helpful in finding minor performance gains by looking for places to add data files or remove log files.

Comments closed

112 Million Cab Rides In Azure SQL Data Warehouse

Published 2018-10-23 by Kevin Feasel

Derik Hammer wants a real test of Azure SQL Data Warehouse:

The method that I liked the most and finally settled on was to use a public dataset. I wanted data which was skewed in real ways and did not require a lot of work to massage. Microsoft has a great listing of public datasets here.

I decided to go with the NYC Taxi and Limousine Commission (TLC) Trip Record Data. Data is available for most taxi and limousine fares with pickup/drop-off and distance information between January 2009 and June 2018. This includes data for Yellow cab, Green cab, and for hire vehicles. Just the Yellow cab data from 01/2016 – 06/2018 is over 112,000,000 records (24 GBs) and they download into easy to import comma separated values (CSV) files.

Read on to see how you can set it up yourself. As Derik points out at the end, though, this is still one big table, but there are a few columns which can lead to dimensions, things like rate code, location, and payment type.

Comments closed

Learning Why A Plan Was Removed From Cache

Published 2018-10-23 by Kevin Feasel

Grant Fritchey shows us that there is some limited information to tell us why an execution plan was removed from cache:

You’ll note that the second statement in the sequence is “CREATE OR AL…” in the batch_text. That’s me modifying the procedure. The very next event is sp_cache_remove. It shows the remove_method as “Compplan Remove”. This the plan being removed in an automated way from cache. The next three events are all for query_cache_removal_statistics.

What are they?

These are the statement level statistical information being removed from the DMVs. That’s right, we can observe that information getting removed from the system along with the plan from cache.

Unless I’m missing something, it seems like this is more helpful for pedagogical reasons rather than auditing reasons—I’d be concerned that on a busy production system, we’d see too many messages to correlate things all that well.

Comments closed

Using cdata To Created Faceted Plots

Published 2018-10-22 by Kevin Feasel

Nina Zumel shows how to use the cdata package to create faceted ggplot2 plots:

First, load the packages and data:
library("ggplot2")
library("cdata")

iris <- data.frame(iris)
Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.

Read on to see how you can use cdata to tie together different faceted plots.

Comments closed

Generating E-Mail Alerts From Perfmon

Published 2018-10-22 by Kevin Feasel

Dave Bermingham shows how to send e-mail alerts based on Perfmon counter values:

The first thing that you need to do is write a Powershell script that when run can send an email. While researching this I discovered many ways to accomplish this task, so what I’m about to show you is just one way, but feel free to experiment and use what is right for your environment.

In my lab I do not run my own SMTP server, so I had to write a script that could leverage my Gmail account. You will see in my Powershell script the password to the email account that authenticates to the SMTP server is in plain text. If you are concerned that someone may have access to your script and discover your password then you will want to encrypt your credentials. Gmail requires and SSL connection so your password should be safe on the wire, just like any other email client.

It’s an interesting use of built-in Windows functionality to perform alerting.

Comments closed

Baslining Modern Versions Of SQL Server

Published 2018-10-22 by Kevin Feasel

Erin Stellato goes back over an older baselining article and gives us some updates in what we should consider for more recent versions:

Last week I got an email from a community member who had read this older article of mine on baselining, and asked if there were any updates related to SQL Server 2016, SQL Server 2017, or vNext (SQL Server 2019). It was a really good question. I haven’t visited that article in a while and so I took the time to re-read it. I’m rather proud to say that what I said then still holds up today.

The fundamentals of baselining are the same as they were back in 2012 when that article was first published. What is different about today? First, there are a lot more metrics in the current release of SQL Server that you can baseline (e.g. more events in Extended Events, new DMVs, new PerfMon counters, sp_server_diagnostics_component_results). Second, options for capturing baselines have changed. In the article I mostly talked about rolling your own scripts for baselining. If you’re looking to establish baselines for your servers you still have the option to develop your own scripts, but you also can use a third-party tool, and if you’re running SQL Server 2016+ or Azure SQL Database, you can use Query Store.

Read on for more details.

Comments closed

Calling Power BI REST API From Microsoft Flow

Published 2018-10-22 by Kevin Feasel

Chris Webb has started a series on calling Power BI’s REST API from Microsoft Flow. In Part 1, he creates a custom connector:

Playing around with Microsoft Flow recently, I was reminded of the following blog post from a few months ago by Konstantinos Ioannou about using Flow to call the Power BI REST API to refresh a dataset:

https://medium.com/@Konstantinos_Ioannou/refresh-powerbi-dataset-with-microsoft-flow-73836c727c33

I was impressed by this post when I read it, but don’t think I understood quite how many exciting possibilities this technique opens up for Power BI users until I started to use it myself. The Power BI dev team are making a big investment in the API yet most Power BI users, myself included, are not developers and can’t easily write code (or PowerShell scripts) to call the API. With Flow, however, you can use the API without writing any code at all and solve a whole series of common problems easily. In this series of blog posts I’m going to show a few examples of this.

In Part 2, Chris shows us how to automate data refreshes when source data changes:

For a while now I’ve had an idea stuck in my head: wouldn’t it be cool to build a Power BI solution where a user could enter data into an Excel workbook and then, as soon as they had done so, they could see their new data in a Power BI report? It would be really useful for planning/budgeting applications and what-if analysis. I had hoped that a DirectQuery model using the CData Excel custom connector (mentioned here) might work but the performance wasn’t good enough; using Flow with the Power BI REST API (see Part 1 of this series for details on how to get this set up) gets me closer to my goal, even if there’s still one major problem with the approach. Here’s how…

Read on for the approach as well as the major problem.

Comments closed