Press "Enter" to skip to content

Month: August 2022

Calculating the Hurst Exponent in R

Sang-Heon Lee does some analysis:

Pairs trading literature use the Hurst exponent frequently since it gives an simple and intuitive indicator for the behavior of stock returns. Using S&P 500 returns, let’s learn how to estimate it using R code manually and then use R package conveniently.

Click through for those two examples, as well as a more detailed explanation of the math driving this. H/T R-Bloggers.

Comments closed

Interpreting Kernel SHAP

Michael Mayer digs into Kernel SHAP:

In their 2017 paper on SHAP, Scott Lundberg and Su-In Lee presented Kernel SHAP, an algorithm to calculate SHAP values for any model with numeric predictions. Compared to Monte-Carlo sampling (e.g. implemented in R package “fastshap”), Kernel SHAP is much more efficient.

I had one problem with Kernel SHAP: I never really understood how it works!

Needless to say, Michael knows Kernel SHAP a lot better now, considering there’s now a kernelshap package for us.

Comments closed

Migrating Databases between SQL Managed Instances

Etienne Lopes performs a migration:

In this post I’m going to show a very simple way to migrate a database between two SQL Server managed Instances in Azure. I’m not a big fan of bacpac files (although I work with it when necessary) so I’ll use a different approach here. Besides, when creating a bacpac file using SSMS there are some schema validations that occur at the beginning that will abort the bacpac generation for example if the database holds three-part names inside stored procedures. While not supported in SQL Azure DB it is supported in SQL Managed Instances (as are cross-database queries), and it can be quite frustrating to experience this show stopper when using bacpac’s to migrate or copy databases between Managed Instances.

Click through for the demo. And yeah, I’ve run into limiting factors with bacpacs, such as having certificates for encrypting data (even if you back those up separately).

Comments closed

T-SQL Tuesday 153 Roundup

Kevin Kline musters the troops:

I received a great collection of blog posts in response to T-SQL Tuesday 153 which I kicked off on Tuesday, August 2nd – asking you to write about a conference or event that had a significant event on their life. As one of the small handful of people who attended every PASS Summit from its founding through the pandemic lockdown, I’ve witnessed so many transformational experiences firsthand.

Human beings are social creatures and though we as IT pros like to focus on hard technology skills first and foremost, I think we can all admit that having a great social experience at a conference like the PASS Summit in North America, SQLBits in the UK, or Data Platform Summit in India is at least as important as the technical learning.

Read on for a summary of several posts.

Comments closed

Backup Jobs and Dropped Databases

Chad Callihan reasons through a use case:

I’m a big fan of Ola Hallengren’s SQL Server maintenance scripts and would recommend that anyone working with SQL Server check them out. They have served me well over the years. As it relates to today’s blog post, maybe too well…

I recently ran into a strange situation with the DatabaseBackup stored procedure that had me scratching my head: a backup job completing successfully for a database that didn’t exist.

Confused? So was I. Let’s take a look at how it happened.

Click through for the scenario.

Comments closed

Adding an Existing Data Factory to GitHub

Andy Leonard has a three-parter for us. Part 1 shows you how to create a GitHub account and repo:

The unabridged topic of source control with github is beyond the scope of this post. There are a number of ways to accomplish the tasks described in this post and series. I welcome your suggestions in the comments.

This post is written to help Azure Data Factory developers get started using github.

Part 2 connects a Data Factory to the repository:

For the purposes of this demo, accept the defaults for “Publish branch” and “Root folder.” Check the “Import existing resources to repository” checkbox under the “Import existing resource” property, select the main branch in the “Import resource into this branch” property, and then click the “Apply” button:

Part 3 handles changes:

Applying what we’ve configured and learned thus far, let’s put this to work in a code-management workflow.

When it’s time to make a change, first create a new branch. I can hear some of you thinking, “Why, Andy? Why create a new branch?” That’s an excellent question. I am so glad you asked! Think of the new branch as a temporary copy of the current state of my Azure Data Factory. 

This series works from the assumption that you don’t have any real experience with Git (or GitHub) for source control, and maybe not much source control experience at all.

Comments closed

KQL StartOf Functions

Robert Cain continues a series on KQL:

In the previous post, Fun With KQL – DateTime Arithmetic, we had hard coded a date for the start of the year, in order to find out how much time had elapsed between it and datetime columns. I had mentioned there are ways to dynamically calculate such values.

In this post we’ll look at one way, using the StartOf... functions. These include startofyearstartofdaystartofmonth, and startofweek.

Read on to see how they all work.

Comments closed

From Kafka to Azure Data Explorer with Protobuf Data

Anshul Sharma and Ramachandran G do a bit of converting:

Kafka is increasingly become a popular choice of scalable message queueing for large data processing workloads. This makes it very popular in IoT based ecosystem where there is large ingress in data before data processing (or) data storage. Azure Data Explorer  is a very powerful time series and analytics database that suits IoT scale data ingestion and data querying.  

Kafka supports ingestion of data in multiple formats including JSON, Avro, Protobuf and String. ADX supports ingestion of data from Kafka into ADX in all these formats. Due to excellent schema support, extensibility to various platforms and compression, [protobuf](https://developers.google.com/protocol-buffers) is increasingly becoming a data exchange choice in IoT based systems. The ADX Kafka sink connector leverages the Kafka Connect framework and provides an adapter to ingest data from Kafka in all these formats. 

The following section aims to provide configuration to support ingestion of protobuf data from Kafka to ADX. 

Click through for the high-level architecture and a deeper dive into the process.

Comments closed

Improvements to Parameter Sensitive Plan Optimization

Erik Darling is not good at being on vacation:

Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.

If we step back through the old demo, we’ll get different results.

Click through for those results.

Comments closed

Embedding Power BI into PowerPoint

Matt Allington integrates a Power BI report into PowerPoint:

I first blogged about this back in October 2021 when Microsoft announced live Power BI embedding was coming to PowerPoint. Believe it or not, the ability to embed Power Pivot reports into PowerPoint was one of the first features delivered by Microsoft way back in 2014 or 2015. It used to be possible with the first release of Power Pivot for Excel and SharePoint Enterprise Edition. Sometime after releasing this feature, Microsoft refocussed its efforts away from Power Pivot/SharePoint Enterprise and started to re-build again from scratch as a new standalone BI app known to us today as Power BI. It then took another 7 years for this PowerPoint feature to be returned. It’s here now, so let me cover what it is and how you can use it.

I’d joke about how much of an atrocity this is but it really isn’t. Thinking about how many meetings get derailed by the person trying to leave PowerPoint, struggling to open another application, having things fall apart, and then going back to the slide deck (inevitably from the beginning rather than the current slide), this is a good idea.

Comments closed