Press "Enter" to skip to content

Day: March 22, 2022

“Production” in Data Analytics

Joey Jablonski brings up an important point:

Data-driven environments have a fundamentally different set of needs around testing, deployment, and visibility then traditional business applications. Data driven environments need access to fresh data on a high level of update frequency to ensure that data engineers and data scientists are able to effect outputs and recommendations on a timeline that has a positive impact on business decisions and customer experiences.

My day job involves running a predictive analytics team. We train models on production data—there’s very little value in training models on artificial dev data (outside of understanding the parameters of the modeling process), so even our development data generally comes from production. I don’t know that I’m sold on data mesh as a solution to this but it’s worth investigation.

Comments closed

Coding Style in R

Maelle Salmon and Chrisophe Dervieux share some guidance on coding style:

Do you indent your code with one tab, two spaces, or eight spaces? Do you feel strongly about the location of the curly brace closing a function definition? Do you have naming preferences? You probably have picked up some habits along the way. In any case, having some sort of consistency in coding style will help those who read the code to understand, fix or enhance it. In this post, we shall share some resources about coding style, useful tools, and some remarks on etiquette.

It is pretty funny how picky we can be about coding style at the margins but ultimately, the primary goal of a coding style should be to give future maintainers as easy a time as possible in troubleshooting the code you write. This makes consistency the most important consideration. After that, there’s a lot of good advice in the post.

Comments closed

Integrating the Intercom API with Power BI

Meagan Longoria brings in data from Intercom:

I needed to pull some user data for an app that uses Intercom. While I will probably import the data using Data Factory or a function in the long term, I needed to pull some quick data in a refreshable manner to combine with other data already available in Power BI.

I faced two challenges in getting this code to work:

1. Intercom’s API uses cursor-based pagination when retrieving contacts

2. I needed this query to be refreshable in PowerBI.com so I could schedule a daily refresh.

Read on to see how Meagan solved those problems.

Comments closed

Finding SQL Agent Jobs with Invalid Job Owners

Chad Callihan is trying to clean up this town:

Do you know which account is the owner for your SQL Server Agent jobs? Some jobs might be owned by user accounts which can become a problem. What happens if that job owner isn’t around forever and goes away? Will that job just keep chugging along?

Without an existing owner, a SQL Server Agent job will not run. Once a user gets disabled or removed from Active Directory, a job is still going to try running under that user but will begin failing.

Click through to see what kind of errors you might find.

Comments closed

Creating Local Server Groups in SSMS

Andrea Allred creates a local server group:

In the past, I have talked about CMS (Central Management Servers), but now I don’t have CMS configured and still want to query multiple instances at once. Local Server Groups are my friend.

My preference is CMS, especially as the number of data platform professionals increases. Keeping track of all of those new instances can be a pain otherwise. But if you’re in an environment in which that’s not an option, local server groups provide a reasonable alternative.

Comments closed

An Overview of Azure Redis Cache

Arun Sirpal lays out the use case of Azure Redis Cache:

Redis Cache is a well know caching technology and you can run it in Azure as a fully managed service. A common requirement (the most basic one) is doing a workflow like:

1. When an application needs to retrieve data, it will first search to see if it exists in Azure Cache for Redis.

2. If the data is found in Azure Cache for Redis (cache hit) use it

3. If the data is not found in Azure Cache for Redis (cache miss), then the application will need to retrieve the data from Azure SQL (or whatever cloud db back end you use)

4. For cache miss scenarios, the requesting application should add the data retrieved from the Azure Database to Azure Cache for Redis.

This is also known as the cache-aside pattern. If you’re feeling really cheeky, you can combine cache-aside with the decorator pattern to “hide” the cache in your code.

Comments closed

Running SQL Scripts on Snowflake from Azure Data Factory

Koen Verbeeck shows off the Script activity in Azure Data Factory:

Azure Data Factory has a new activity introduced this week (around the 10th of March 2022 for you future readers): the Script activity! This is not to be confused with the script task/component of SSIS, which allows you to execute .NET script (C# for most people, or VB if you’re Ben Weissman). No, this task executes SQL, so it’s more akin to the Execute SQL Task of SSIS.

Click through to see how it works while I lament the fact that SSIS never supported the best .NET language.

Comments closed

An Overview of Simple Parameterization

Paul White begins a new series:

This is the first part of a series about simple parameterization and trivial plans. These two compilation features are closely connected and have similar goals. Both target performance and efficiency for workloads frequently submitting simple statements.

Despite the “simple” and “trivial” names, both have subtle behaviours and implementation details that can make how they work difficult to understand. This series doesn’t dwell too long on the basics but concentrates on less well-known aspects likely to trip up even the most experienced database professionals.

In this first part, after a quick introduction, I look at the effects of simple parameterization on the plan cache.

Read on for more detail.

Comments closed