Press "Enter" to skip to content

Month: October 2020

Copying an Azure SQL Database

Garry Bargsley gives us two methods for copying an Azure SQL Database:

Copying an Azure SQL Database is a vital skill when managing cloud databases. Recently, a request was received from the “business”. They wanted to create a copy of an Azure SQL Database that was in a development environment. The database has been certified and early testing was accepted. They now want an exact copy in QA to start integration testing. The process of making an Azure SQL Database copy is straightforward. There are several different ways to perform this action.

Two methods chosen will use the Azure Portal and PowerShell to demonstrate the completion of this request.

Click through for the demos.

Comments closed

Sliding Windows in R

Bryan Shalloway shows off some new functionality in the rsample package:

For some problems you may want to take a traditional regression or classification based approach while still accounting for the date/time-sensitive components of your data. In this post I will use the tidymodels suite of packages to:

– build lag based and non-lag based features
– set-up appropriate time series cross-validation windows
– evaluate performance of linear regression and random forest models on a regression problem

For my example I will use data from Wake County food inspections. I will try to predict the SCORE for upcoming restaurant food inspections.

Click through to see it in action.

Comments closed

Gussying Up R Tables in GitHub

Laura Ellis solves a problem:

One thing I love about performing analysis in .Rmd (R Markdown) files is how easy it is to share your results publicly on GitHub. Create your analysis in the .Rmd file, set your output variant as below, knit to .md format and then add your files to GitHub!

There is only one problem with the .md output: PRETTY TABLES! Most of the pretty tables packages that I like to use, or don’t display all of the formatting, or don’t display at all in .md format.

Click through to see how to solve this, including demonstration videos.

Comments closed

Set Functions in DAX

Marco Russo and Alberto Ferrari walk us through three important set functions in DAX:

In this article we refer to “set functions” as functions that operate on sets. The three set functions available in DAX are: UNIONINTERSECT, and EXCEPT. Their behavior is very intuitive:

UNION performs the union of two or more tables.
INTERSECT performs the set intersection between two tables.
EXCEPT removes the rows of the second argument from the first one.

These functions take two or more tables as parameters and return a table. They prove useful not only to write DAX queries; a developer can also use these functions to prepare complex filters when implementing measures.

Read on to see how these work in DAX.

Comments closed

Diagnosing and Solving tempdb Bottlenecks

Ameena Lalani shares some good info on the tempdb database:

Tempdb is a SQL Server temporary system database. Everytime SQL Server restarts, new Tempdb database is created. Whatever data was stored in the tempdb is lost. When TempDB gets recreated its data and log files reset to their last configured size. SQL Server records only enough information in the tempdb transaction log to roll back a transaction, but not to redo transactions during database recovery. This feature increases the performance of INSERT statements in tempdb. Tempdb database is always in Simple recovery mode. If you try to change it to Full Recovery mode, you will receive the following error message.

Click through for more info on how the database is special, types of issues you can run into as concurrency grows, and ways to resolve those issues.

Comments closed

Finding the Most Costly Statement in a Stored Procedure

Grant Fritchey takes us through one method of figuring out what which statement you’re waiting to finish when running a stored procedure:

A lot of stored procedures have multiple statements and determining the most costly statement in a given proc is a very common task. After all, you want to focus your time and efforts on fixing the things that cause you the most pain. You simply don’t have the time to tune every single statement in every single procedure. So, identifying the most costly statement is vital.

Happily, Extended Events are here to help.

Click through to see how you can use extended events to figure this out.

Comments closed

Downloading Power BI Reports from a Workspace

Shabnam Watson has a helpful script for us:

You can use PowerShell to download all of your PBI reports in a workspace all at once without having to go through the PBI service UI one at a time. As an added bonus, you may notice that downloading a report with PowerShell is faster that downloading it through the PBI Service UI.

This script is useful for admins to take backups of reports deployed to PBI Service. It can be easily extended to loop through all/several workspaces. It is also useful for developers to take a backup of their report before publishing a new version.

Click through for the script.

2 Comments

Using sp_PressureDetector to Find Resource Constraints

Erik Darling has started a new series:

If you go to open a new tab in SSMS and you hear a creaking sound, it might be time to revisit some of your hardware choices.

But that’s a problem for future you. Your problem is right now.

– How to check it: sp_PressureDetector
– What to look for: Stuck queries, Resource Usage, Wait Stats

By default, it’ll look at both CPU and Memory counters. If you don’t know what’s going on, just hit F5.

Read on to see what it looks like for a server hitting memory limits and for a server hitting CPU limits.

Comments closed

Visualizing Parallelism in Power Query Diagnostics Data

Chris Webb wants to track query concurrency when loading data into Power BI:

Most of the time I’ve spent looking at Power Query Query Diagnostics data, I’ve been looking at data for a single query. Over the past few days though I’ve spent some time investigating what this data shows for all the queries that are executed for a single dataset refresh. To help me do this I wrote the following M function:

Click through for the function, as well as ways of visualizing the results.

Comments closed

Measuring Advertising Effectiveness

Layla Yang and Hector Leano walk us through measuring how effective an advertising campaign was:

At a high level we are connecting a time series of regional sales to regional offline and online ad impressions over the trailing thirty days. By using ML to compare the different kinds of measurements (TV impressions or GRPs versus digital banner clicks versus social likes) across all regions, we then correlate the type of engagement to incremental regional sales in order to build attribution and forecasting models. The challenge comes in merging advertising KPIs  such as impressions, clicks, and page views from different data sources with different schemas (e.g., one source might use day parts to measure impressions while another uses exact time and date; location might be by zip code in one source and by metropolitan area in another).

As an example, we are using a SafeGraph rich dataset for foot traffic data to restaurants from the same chain. While we are using mocked offline store visits for this example, you can just as easily plug in offline and online sales data provided you have region and date included in your sales data. We will read in different locations’ in-store visit data, explore the data in PySpark and Spark SQL, and make the data clean, reliable and analytics ready for the ML task. For this example, the marketing team wants to find out which of the online media channels is the most effective channel to drive in-store visits.A

Click through for the article as well as notebooks.

Comments closed