Press "Enter" to skip to content

Month: August 2020

Managing Lakehouse Data

Harsha Gummadavelli gives us an introduction to the Data Lakehouse concept:

“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes. Once you load the data into a data lake, there is no need to load the data into a warehouse for additional analysis or business intelligence. You can directly query the data residing in cheaper but highly reliable storage, often termed as “Object Stores”, thus reducing the operational overhead on data pipelines.

I will say that I’m not particularly sold on the data lakehouse concept at this point. It’s interesting, in that it reduces the number of systems to maintain by one, but I do wonder about performance issues when trying to replace an existing warehouse. The post turns into a marketing pitch for Informatica, but the first half does give a fair introduction to the concept.

Comments closed

Spark 3.0’s Structured Streaming UI

Genamo Yu, et al, show off the Structured Streaming user interface built into Apache Spark 3.0:

When a developer submits a streaming SQL query, it will be listed in the Structured Streaming tab, which includes both active streaming queries and completed streaming queries. Some basic information for streaming queries will be listed in the result table, including query name, status, ID, run ID, submitted time, query duration, last batch ID as well as the aggregate information, like average input rate and average process rate. There are three types of streaming query status, i.e., RUNNINGFINISHED and FAILED. All FINISHED and FAILED queries are listed in the completed streaming query table. The Error column shows the exception details of a failed query.

Read on to learn more.

Comments closed

Filtering out Blanks in MEDIANX with DAX Studio

Matt Allington continues a series on blanking out:

This article is a follow on from last week. I recommend you go back and read the article first if you missed it, but in summary, I want to write a measure (not a calculated column) that will return the median sales of products while excluding the products with blanks (no sales). As I showed last week, this is relatively easy with a calculated column. Here it is again.  Remember writing calculated columns first is a great way to visualise the problem you want to solve.  It is not a great way to solve most problems (some yes, most, no).

Read on to see how you can solve the problem using DAX Studio.

Comments closed

Creating a Database Project with Azure Data Studio

Wolfgang Strasser takes the database project extension for a spin:

There is currently one requirement to start your database project development in ADS, it is that you need the Insider build of ADS (that you can download here). After the installation, you’ll need to install the extension. Please search for it in the list of extensions and install it in your ADS instance.

Tom Norman and I talked about it in detail on last night’s episode of Shop Talk (to be posted later today). It’s a good start, but there are still some rough edges and missing functionality. I’d expect that to improve over time, though.

Comments closed

Aggregate Splitting in SQL Server 2019

Paul White takes us through a new trick the optimizer has learned:

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Read on as Paul fills that gap.

Comments closed

Custom Formatting of Visuals using Calculation Groups

Gilbert Quevauvilliers shares some exciting news:

The Power BI team has been doing a lot of incredible work. The most recent update which I got wind of is Custom Formatting of measures is now supported for Visuals.

This has already been deployed to the Power BI Service and if you download the
latest version of Power BI Desktop (Version 2.83.5894.961 as at 03 Aug 2020) it has the new features. This means you can use this TODAY!

Previously this was only supported for tables and matrixes.

Click through to see how it looks in Power BI. It’s easy, and that’s a good thing.

Comments closed

Connecting to Cosmos DB via Linked Server

Frank Solomon takes us through communicating with Cosmos DB from SQL Server:

Every source table column becomes an expression in the SELECT clause. If needed, JSONLint, for example, can validate the output JSON format. In this query, the FOR XML PATH clause places each row into a formatted JSON row, with key/value pairs that match the column/value pairs of the original rows. To get the data ready, the empty (”) value in the FOR XML PATH() clause at line 10 separates each XML row with a default comma. At line 11, the STUFF function arguments format the result set as a string and remove the leading “.” in the original data. Save the finished result XML-format result set as a JSON file. This file will become the data we’ll import.

Cosmos DB database has zero or more collections, which correspond to SQL Server tables. A collection has zero or more documents, which correspond to SQL Server table rows. In the Cosmos DB

With SQL Server 2019, PolyBase also allows connections to Cosmos DB if (and only if) you are using the MongoDB API for Cosmos. But if that’s how your collection is set up, querying it becomes pretty easy.

Comments closed

Understanding DAX’s LOOKUPVALUE Function

Alberto Ferrari explains how the LOOKUPVALUE works:

LOOKUPVALUE requires a column to retrieve a set of column/value pairs to provide the search conditions, and an optional default value in case there are either no matching rows, or too many matching rows. The following formula retrieves the exchange rate from the Daily Exchange Rate table, where Currency[Currency Code] matches EUR and ‘Daily Exchange Rate'[Date] matches Sales[Order Date]. In case there are no matches, it returns zero:

Alberto also provides a primer on the function in case you are unfamiliar with it, as this post starts with the assumption that you know what it does.

Comments closed

Choroplethr 3.6.4 on CRAN

Ari Lamstein announces that Choroplethr version 3.6.4 is now on CRAN:

Choroplethr v3.6.4 is now on CRAN. This is the first update to the package in two years, and was necessary because of a recent change to the tigris package, which choroplethr uses to make Census Tract maps. I also took this opportunity to add new example demographic data for Census Tracts.

Read on for a listing of the updates, examples, and a request from Ari to help keep the project up to date by finding a suitable sponsor. H/T R-Bloggers

Comments closed

Credential and Secrets Management in R

Bernardo Lares walks us through some good practices around managing credentials and secrets in R:

I have several functions that live in my public lares library that use get_creds() to fetch my secrets. Some of them are used as credentials to query databasessend emails with API services such as Mailgun, ping notifications using Slack‘s webhook, interacting with Google Sheets programatically, fetching Facebook and Twitter’s API stuff, Typeform, Github, Hubspot… I even have a portfolio performance report for my personal investments. If you check the code underneath, you won’t find credentials written anywhere but the code will actually work (for me and for anyone that uses the library). So, how can we accomplish this?

Read on to learn how.

Comments closed