Press "Enter" to skip to content

Day: January 22, 2021


Hong Ooi takes us through an R library for working with Cosmos DB:

Among other features, Azure Cosmos DB is notable in that it supports multiple data models and APIs. When you create a new Cosmos DB account, you specify which API you want to use: SQL/core API, which lets you use a dialect of T-SQL to query and manage tables and documents; MongoDB; Azure table storage; Cassandra; or Gremlin (graph). AzureCosmosR provides a comprehensive interface to the SQL API, as well as bridges to the MongoDB and table storage APIs. On the Resource Manager side, AzureCosmosR extends the AzureRMR class framework to allow creating and managing Cosmos DB accounts.

AzureCosmosR is now available on CRAN. You can also install the development version from GitHub, with devtools::install_github("Azure/AzureCosmosR").

Hong provides examples for us using three of the Cosmos DB APIs, so check it out.

Comments closed

Logical Separation in Azure Data Factory

Rayis Imayev is at a crossroads:

I was raised listening and reading fairy tales where the main character would reach a crossroad with a large stone that had some directions written on it – turn right and you will lose your horse, turn left and you will lose your life, walk straight and you will find your happiness. 

Also, growing up in a small Ukrainian industrial city, closely situated to a railroad hub, I was always fascinated to see many colorful rail traffic lights, trying to imagine where a myriad of rail tracks would lead trains on them.

Similarly, Azure Data Factory (ADF) provides several ways, to control/direct/filter your pipeline workflows; it’s all conditioned and constrained to the boundaries of my “crossroad stone” writings.

As one of my intellectual heroes is purported to have said, if you see a fork in the road, take it.

Comments closed


Chad Callihan walks us through using AT TIME ZONE in SQL Server:

Dealing with time zones in general can be a headache. Thankfully, SQL Server 2016 included a new clause to make working with time zones more manageable. Let’s look at a couple examples:

I’m in Eastern Standard Time so we’ll start here. Keep in mind that when no offset information is included, SQL Server is going to assume that the date value is in the target time zone.

This works best when your dates are stored in UTC, but Chad does show how to convert between two other time zones. This does not perform all that well when you need to convert a lot of rows, but if you’re doing one or two conversions, it’ll do just fine.

Comments closed

Handling Fuzzy Match Lookup Failures in Power BI

Imke Feldmann continues pulling on a thread:

In a previous article I have described how to use Fuzzy merge to match misspelled data to valid categories. With the sample data at hand, all values could be matched to existing categories. But what can you do if there are some entries that cannot be matched to existing categories?

I would create an “unmatched” value to the categories table before loading to the data model and assign the “unmatched”-value to all unmatched entries. Then I would create a measure that counts the entries within the unmatched category. I would then create a data driven alert, based on this measure. That would trigger an e-mail to the person responsible for maintaining the list. Or, if responsibilities are a bit more complex, trigger a flow in Power Automate that could handle certain conditions to be followed as well.

Click through to see it in action.

Comments closed

Defining Deferred Complation

Gail Shaw explains the concept of deferred compilation:

When I talked about row estimations for table variables, I mentioned ‘deferred compile’, but didn’t give a whole lot of details. What, then, is a deferred compilation? Let’s start with how batches work normally.

T-SQL is an interpreted language. While we talk about compiles, they’re not compilations in the sense of what happens to C++. There’s no conversion of the script to a machine language or intermediate language which is used from that point onwards. Every time a batch executes, it has to be parsed, bound and have an execution plan generated or fetched from cache.

Click through to learn what it does mean.

Comments closed

Deciding to Use Dataflows or Power Query

Paul Turley explains a choice:

When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparently when the business reaches a certain stage of data culture maturity.

Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.

Read on for a bit of architecture and the explanation.

Comments closed