Press "Enter" to skip to content

Author: Kevin Feasel

get_json_object and json_tuple in Hive

The Hadoop in Real World team looks at a pair of Hive functions:

Both get_json_object and json_tuple functions in Hive are meant to work with JSON data in Hive. 

Let’s create a table with some JSON data to work with. We have created a table named hirw_courses and loaded the below JSON text into the table.

Click through for that script as well as seeing how you use each of get_json_object() and json_tuple() and which might be better-suited for your purposes.

Comments closed

Automating Power BI Data Model Metadata Extraction

Gerhard Brueckl avoids manual processes:

In the past I have been working on a lot of different Power BI projects and it has always been (and still is) a pain when it comes to the deployment of changes across multiple tiers(e.g. Dev/Test/Prod). The main problem here being that a file generated in Power BI desktop (.pbix) is basically a binary file and the metadata of the actual data model (BIM) cannot be easily extracted or derived. This causes a lot of problems upstream when you want to automate the deployment using CI/CD pipelines. Here are some common approaches to tackle these issues:

Click through to see several bad to palatable options and then check out Gerhard’s solution, which is significantly better. CI/CD is a huge pain point for Power BI developers but people like Gerhard are doing what they can to help.

Comments closed

Workers and Requests in Azure SQL Database

Kendra Little has a documentation change of note:

We now explicitly define ‘requests’ and ‘workers’ in the Azure SQL Database documentation, and we’ve cleaned up multiple places where we used to equate the two terms. In this post, I share the history of the two terms when it comes to Azure SQL Database, why the two were ever equated, and why things like this are tricky to change.

There’s a bit of behind-the-scenes around documentation work as well and the types of challenges you might run into when developing software for the public.

Comments closed

Data Mesh in Azure: Self-Service Infrastructure

Paul Andrew continues a series on applying data mesh principles in Azure:

This principal is very broad, so I want to break down the theory vs practice as before. The idea of self-service is always a goal in any data platform and the normal thing for analytics is to focus on this within the context of our data consumption. Whereby a semantic layer technology can be used in a friendly business orientated, drag-drop type environment to create dashboards or whatever.

However, my interpretation of ‘self-serve’ for a data mesh architecture goes further than just the dashboard creation use case. This should not just apply at the data consumption layer, but all layers within the solution and for clarify, not just related to the data itself. Hence the term in this principal ‘data infrastructure as a platform’. This then unlocks the deeper implication of this serving for a data product, all abstracts of the platform can be consumed in a self-service manner from a series of predefined assets. Let’s think about this serving more like an internal marketplace or catalogue of assets for delivering everything the data product needs to enable a new node within the wider data mesh.

Read on for some deep thoughts on the topic.

Comments closed

Tracking Table and Column Usage for Power BI Premium/PPU

Gilbert Quevauvilliers wants to see who’s using what tables:

I was reading through the blog post Announcing on-demand loading capabilities for large models in Power BI and I got a thought would it not be great to better understand which columns and tables are being used in my Power BI Premium/Premium per user datasets?

To do this, using the new DMV I could now look at the temperature of the tables-column.

The higher the temperature the more the table-column is being used in my reports!

Click through to see how Gilbert put this together but also pay attention to the caveats.

Comments closed

Markdown Tools for VS Code

I highlight a pair of useful extensions for Visual Studio Code:

The first tool of choice is a big one, Yu Zhang’s Markdown All in One. This extension provides several great features. One of my favorites is its support for creating a table of contents. After opening the Command Palette (Ctrl + Shift + P), select Markdown All In One: Create Table of Contents and it creates a ToC for you based on the heading markers you already have.

Read on for several more things I like about this tool, as well as a discussion of a second useful extension.

Comments closed

The Transaction Log Architecture

Paul Randal continues a series on the transaction log:

In the first part of this series, I introduced basic terminology around logging, so I recommend you read that before continuing with this post. Everything else I’ll cover in the series requires knowing some of the architecture of the transaction log, so that’s what I’m going to discuss this time. Even if you’re not going to follow the series, some of the concepts I’m going to explain below are worth knowing for everyday tasks DBAs handle in production.

Read on to learn more about some key transaction log terminology.

Comments closed