June 2021 – Page 12 – Curated SQL

Looking at the Power BI release plan, dataflow support for Deployment Pipelines is coming up shortly! Currently it is scheduled for June 2021 to reach the public preview state. Versioning and DevOps integration go hand-in-hand to our opinion. With Azure DevOps Git integration, we can overcome the versioning challenge while integrating with Azure DevOps at the same time, as described in the previous blog in 2019. Today, we release a new version of the DevOps implementation which uses native Power BI functionality. Stay tuned!
As we really like the metadata deployment and the ease of setup a pipeline in the Power BI Service, Ton and I decided to setup an Azure DevOps extension based on the recently released Power BI REST APIs for Deployment Pipelines. Although Microsoft promised to come-up with a native DevOps extension over time, we decided to go for it. Time to bridge the gap!

Read on for more details.

Comments closed

Building a Powershell Script Template

Published 2021-06-07 by Kevin Feasel

Eitan Blumin has a template for us:

If at any point an automated script fails for some reason, or does not behave as expected, it would be invaluable to have it produce and retain historical logs that could later be investigated for clues as to what it did, and where and why it failed.
Powershell has a few useful cmdlets for this, capable of writing an Output to any sort of destination, such as a log file. For example, Out-File.
However, in my personal experience, nothing beats the level of verbosity offered by a special cmdlet called Start-Transcript.

Read on for an explanation for each part of the template, and then the template itself.

Comments closed

Handling Unused Indexes

Published 2021-06-07 by Kevin Feasel

Welcome to the Chad Callihan Home for Unloved Indexes:

What do you do if you see that an index is being updated but is low on seeks and scans? Should you drop it because the table is never queried or a better index is being chosen? If an index doesn’t have updates, seeks, or scans then is the next step to drop?
The answer to these questions is a resounding “not yet!” Getting rid of an index that isn’t getting much use sounds simple but there are careful considerations to make.

Click through for those considerations.

Comments closed

Speeding Up Power Query with Evaluation Container Memory

Published 2021-06-07 by Kevin Feasel

Chris Webb notes a new toggle in Power Query:

However if you have just read the docs you may be wondering what these two new registry key settings actually do. In this post I’m only going to talk about one, MaxEvaluationWorkingSetInMB; I’ll leave ForegroundEvaluationContainerCount for a future post.
At various times in the past I have blogged about how, when you run a Power Query query, the query itself is executed inside a separate process called an evaluation (or mashup) container and how this process has a limit on the amount of memory it can use. Some transformations such as sorting a table, doing a group by, pivoting and unpivoting require an entire table of data to be held in memory and if these operations require more memory than the evaluation container is able to use then it starts paging and query performance gets a lot worse.

Read on to see where setting the max evaluation working set in memory can help, as well as the caveats that Chris lays out.

Comments closed

Hosting a Python API with Flask

Published 2021-06-04 by Kevin Feasel

Mrinal Walia shows how you can build a Python API, such as one for generating machine learning predictions, using Flask:

Deployment is a crucial move in the ML workflow. It is a mark where we want to implement our ML model into utilization. Later, we can practice the model in practical life.
But how can we design the model as a treatment? We can develop an Application Programming Interface (API). With that, we can reach the model universally, can be a mobile application or web application. In Python, there’s a library that can assist us in building an API. It’s named Flask.
This article will explain how to construct a REST API for our machine learning model utilizing Flask. Without further ado, let’s begun!

Flask is the first step, but then I’d want to reverse proxy it with gunicorn or Nginx afterward.

Comments closed

Standalone Shiny Apps with systemd

Published 2021-06-04 by Kevin Feasel

Peter Solymos takes us through configuring a Shiny application which runs via systemd:

Lots of resources describe how you can host Shiny apps with Docker, Shiny Server, or via other means. But we also know Shiny apps can be launched locally. What makes your local setup different from these other options is that your local machine does not usually have a static internet protocol (IPv4) address. Without a static IPv4, it is really hard to share the app with other people because the address keeps changing unpredictably, and you might sometimes power off your machine.
Shiny uses the httpuv R package under the hood which is an HTTP and websocket server library. Could you just run Shiny directly on a remote server? This post explores this topic using systemd the system and service manager for most modern Linux distributions. All I am trying to do in this post is to make a point that the shiny R package is really self-sufficient and in the simplest case, it does not need any other layer for sharing an app.

Click through for the process. H/T R-Bloggers.

Comments closed

TempDB Configuration Tips

Published 2021-06-04 by Kevin Feasel

Jeff Iannucci adds on to tempdb configuration advice:

Step 1: Provision a separate drive. Put your TempDB away from your precious user data files, as well as the other system databases. This probably isn’t your job, so ask your friends on your infrastructure team to provision and mount a T: drive or whatever of a size that you gauge appropriately. What’s appropriate? Hey, I don’t know your system – that’s why they pay you the big bucks, friend.

Read on for the full set of tips.

Comments closed

Memory Grants for CHAR vs VARCHAR

Published 2021-06-04 by Kevin Feasel

Erik Darling has a head-slapping moment:

While working with a client recently, we found that someone, at some point in time, probably during the original migration from Access, had chosen CHAR columns rather than VARCHAR columns.
Okay, fine. How bad could it be?
Bad enough that… A whole bunch of columns that only had a single character in them were stored in CHAR(1000) columns.

I like CHAR…well, to be specific, NCHAR. But only when you’ll need exactly that many characters.

Comments closed

Improving Parquet External Table Performance in Dedicated SQL Pools

Published 2021-06-04 by Kevin Feasel

Jovan Popovic shows us two ways of accessing data in Parquet files in Azure Synapse Analytics dedicated SQL pools:

Azure Synapse Analytics enables you to read Parquet files stored in the Azure Data Lake storage using the T-SQL language and high-performance Parquet readers. The key characteristic of these high-performance Parquet readers is that they are using the native (C++) code for reading Parquet files, unlike the existing Polybase Parquet reader technology that uses the Java code. These native readers are introduced in the serverless SQL pools in Azure Synapse Analytics workspaces.
In many experiments, this native technology that is used in the serverless SQL pools demonstrated better performance compared to the existing Polybase external table in the dedicated SQL pools.
This native technology for reading Parquet files is now also available in the dedicated SQL pools. In the dedicated Pools in Azure Synapse Analytics, you can create external tables that use native code to read Parquet files and improve performance of your queries that access external Parquet files.

Click through for the process, as well as what kind of performance differences you can see. Some of the queries ended up being worse for native tables versus PolyBase tables, but the majority were a good bit better.

Comments closed

Creating a CSV File from a Table via BCP

Published 2021-06-04 by Kevin Feasel

Kenneth Fisher shows how to use the bcp command to create a file from a table:

This is a pretty handy little tool in your arsenal. I’ve talked about using bcp to transfer data from one instance to another before and this is another really great use for bcp. If you haven’t used it before bcp stands for Bulk Copy Protocol and is a command line tool for transferring data in and out of SQL Server. In this case you can use this command to generate a csv file from DBName.SchemaName.TableName:

Click through to see the command, as well as some helpful hints.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Month: June 2021

Integrating Power BI Deployment Pipelines with Azure DevOps

Building a Powershell Script Template

Handling Unused Indexes

Speeding Up Power Query with Evaluation Container Memory

Hosting a Python API with Flask

Standalone Shiny Apps with systemd

TempDB Configuration Tips

Memory Grants for CHAR vs VARCHAR

Improving Parquet External Table Performance in Dedicated SQL Pools

Creating a CSV File from a Table via BCP