Press "Enter" to skip to content

Author: Kevin Feasel

Using the Power BI Color Picker

David Eldersveld walks us through the Power BI color picker:

The new color picker allows colors in RGB format in addition to the hex color format that Power BI has used exclusively until now.

The new one also easily allows users to choose from a wider selection of shades and tones. This builds upon the simpler selection of hues and tints in the original.

In case you don’t know what David means, there is an excellent explanation of each term.

Comments closed

A Diagnostic Book for SQL Server

Emanuele Meazzo has a new years’s gift for us:

Welcome to 2020! I wanted to start this year by giving to all my fellow consultants another way to troubleshoot our beloved SQL Servers; I’ve already talked about diagnostic notebooks in the past, and now, since Azure Data Studio has implemented the feature, I wanted to group them into a Diagnostic Book.

As the name implies, a jupyter book is no other than a collection of notebooks (and markdown files) that groups everything in a coherent space, with an index and navigation options alike.

I think this sort of collection of notebooks (a, uh, note-book), if put together well, makes it easier to learn a new environment and understand key problems than a big Scripts.txt file or a folder full of scripts.

Comments closed

Repartitioning and Coalescing in Spark

Divyansh Jain contrasts repartitioning and coalescing in Spark:

What is Coalesce?

The coalesce method reduces the number of partitions in a DataFrame. Coalesce avoids full shuffle, instead of creating new partitions, it shuffles the data using Hash Partitioner (Default), and adjusts into existing partitions, this means it can only decrease the number of partitions.

What is Repartitioning?

The repartition method can be used to either increase or decrease the number of partitions in a DataFrame. Repartition is a full Shuffle operation, whole data is taken out from existing partitions and equally distributed into newly formed partitions.

Read on to learn good reasons to use both.

Comments closed

Performance Tuning Load of Partitioned Hive Tables on S3 with Spark

Dmitry Tolpeko walks us through a performance problem in Spark:

I have a Spark job that transforms incoming data from compressed text files into Parquet format and loads them into a daily partition of a Hive table. This is a typical job in a data lake, it is quite simple but in my case it was very slow.

Initially it took about 4 hours to convert ~2,100 input .gz files (~1.9 TB of data) into Parquet, while the actual Spark job took just 38 minutes to run and the remaining time was spent on loading data into a Hive partition.

Let’s see what is the reason of such behavior and how we can improve the performance.

Read on to see how.

Comments closed

Things to Stop Doing

Tom LaRock has a list of life-altering recommendations:

RESPONDING IMMEDIATELY

Stop doing that. Trust me on this, your response to that email/text/slack can wait. Don’t believe me? Try an experiment. For one month do not reply immediately to your emails. At the end of the month add up the number of emails you received, and the number of emails that required an immediate response. I’m willing to bet that the number is quite low, much lower than you realize. And once you realize just how few require an immediate reply, you’ll never look at email in the same way again.

It’s a fun list, and I definitely agree in most circumstances on at least 3/4 of them.

Comments closed

Columnstore Indexes in Azure SQL Database

Niko Neugebauer takes us through the columnstore offerings available in Azure SQL Database:

Almost 2 years ago (22nd of March 2018) in Columnstore Indexes – part 121 (“Columnstore Indexes on Standard Tier of Azure SQL DB”) I have already mentioned that Columnstore Indexes were available in Azure SQL Database in Standard 3 (S3) edition and higher, while people I meet keep on mentioning and believing that in order to get Columnstore Indexes one needs to use Premium editions.

Since that blog post a lot of time has passed and in the mean time we have got new tiers with new generations of provisioned General Purpose tiers (Generation 4, Generation 5, FSv2 Series & M Series) appearing, plus the Serverless Tier and not to forget the very promising Hyperscale tier … besides the Azure SQL Database Managed Instance of course, which has already been generally available for some time and the good old Elastic Pools which were never mentioned in original article.

It sounds like, on the whole, columnstore is a normal part of Azure SQL Database across the board—it’s not a special add-on feature.

Comments closed

Azure Data Factory Notifications

Rayis Imayev walks us through three different techniques for sending notifications in Azure Data Factory:

While working on data integration projects and using Azure Data Factory as your main orchestration tool will help you to develop strategic forward thinking about your development tasks: to ponder on what your data sources are, point of destinations to land this information into a new data model and transformation steps to shape data from the source to its destination. Just like when you play chess and have to plan ahead several of your next moves.

Along with this structural thinking to develop and execute your data flows, timely notifications of when something goes left or right would give you additional peace of mind.

Something I appreciate in this post is that Rayis contrasts the Azure Data Factory techniques with SSIS methods, giving you a solid base for comparison.

Comments closed

Power BI Year in Review

Kasper de Jonge walks us through the biggest changes to Power BI in 2019:

As the end of the year closes I was reminiscing on what a huge year it has been for Power BI. I work mostly with large organisations so my view will be slightly skewed towards that.

For me 2019 has been the year where Power BI got massive adoption as the standard BI platform in an organisation, it went from self serve to also contain corporate BI. 

The Power BI development model does have its downsides, but without a doubt, they get stuff done.

Comments closed

Short Substrings and Computed Columns

Erik Darling gives us a story about computed columns that turns out not to be about computed columns at all (having thereby subverted our expectations):

The problem is that when I tried to index it:

CREATE INDEX dummy
    ON dbo.Users(DisplayNameComputed);

I got this error:

Msg 537, Level 16, State 3, Line 21
Invalid length parameter passed to the LEFT or SUBSTRING function.

And when I tried to select data from the table, the same error.

Click through to find the real query killer.

Comments closed