Press "Enter" to skip to content

Month: February 2023

Thoughts on the New Power BI Accessible Themes

Meagan Longoria is moderately pleased:

Everyone’s vision is a little different. It is rare (impossible?) that a color theme is accessible for everyone. For instance, while many people with color vision deficiency have trouble distinguishing red and green hues, others have trouble distinguishing blue hues. So when we optimize to accommodate one condition, we may make things more difficult for another condition. This happened with the change in accent color in Power BI Desktop from yellow to teal. Changing to teal increased color contrast, which was great for people with low vision, but it caused new issues for some people with color vision deficiency.

While I am very happy to see these new color themes, I hope everyone understands that they aren’t just generally accessible for all uses. As mentioned in the blog post, they specifically have better color contrast to achieve a contrast of at least 3:1, which is the contrast recommended by WCAG for non-text content.

Read the whole thing. There’s a delicate balancing act between having a complete color scheme and satisfying a variety of needs. It sounds like this theme doesn’t quite cut it, though hopefully there will be some improvements in the future.

Comments closed

Performing a Detailed Code Review of a Stored Procedure

Aaron Bertrand has 99 problems and this stored procedure is 40 of them:

I’ve been at this for a while now, and have a very particular set of rules and coding conventions that I follow when writing and, more importantly, reviewing T-SQL code. I often perform code reviews and thought it would be fun to frame this exercise: a completely fictitious stored procedure hits my desk, I’ll reject it of course, and enumerate the reasons why. The procedure might pass review in a lot of shops, but not in mine.

Click through and give it a try. Aaron has outdone himself with this and I got angry with him about 2/3 of the way through the procedure. That’s how you know it’s a good example.

Comments closed

Calculating Log Likelihood Ratios with jeva

Peter M.B. Cahusac takes us through a jamovi package:

Ever wanted to try doing an evidential analysis? You may have found it difficult to find a statistical platform to do it. Now there is the jamovi module jeva which can provide log likelihood ratios for a range of common statistical tests.

Imagine for a moment that we wish to carry out a statistical test on our sample of data. We do not want to know whether the procedure we routinely use gives us the correct answer with a specified error rate (such as the Type I error) – the frequentist approach. Nor do we want to concern ourselves with possible a priori probabilities of hypotheses being true – the Bayesian approach. We need to know whether a statistic from this particular set of data is consistent with one or more hypothetical values. Also, let’s say that we weren’t happy with how much data we had collected (a familiar problem?), and just added more when convenient. Welcome to the likelihood (or evidential) approach!

Read on for an explanation and how to try jeva out.

Comments closed

Calibrating and Plotting a Time Series with healthyR.ts

Steven Sanderson builds a plot:

In time series analysis, it is common to split the data into training and testing sets to evaluate the accuracy of a model. However, it is important to ensure that the model is calibrated on the training set before evaluating its performance on the testing set. The {healthyR.ts} library provides a function called calibrate_and_plot() that simplifies this process.

Click through for the function’s input parameters and an example of how to use it.

Comments closed

Tips for Kafka Streams Developers

Ludovic Dehon shares some advice:

We built Kestra, an open-source data orchestration and scheduling platform, and we decided to use Kafka as the central datastore to build a scalable architecture. We rely heavily on Kafka Streams for most of our services (the executor and the scheduler) and have made some assumptions on how it handles the workload.

However, Kafka has some restrictions since it is not a database, so we need to deal with the constraints and adapt the code to make it work with Kafka. We will cover topics, such as using the same Kafka topic for source and destination, and creating a custom joiner for Kafka Streams, to ensure high throughput and low latency while adapting to the constraints of Kafka and making it work with Kestra.

Click through for several tips.

Comments closed

Aggregations and Distinct Counts in Power BI

Phil Seamark doesn’t have time to wait for Power BI to count:

This article aims to show how you can speed up distinct count calculations in Power BI using the built-in user-defined aggregations feature. The user-defined aggregation feature in Power BI is designed to work with direct query models and usually gets used for calculations such as SUM, MIN, MAX etc. However, it can also work well for distinct count calculations using the pattern shown in this article.

It’s an interesting partial aggregation approach which works really well when the distinct count is a small percentage of the total.

Comments closed

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed

Kafka Control and Data Planes

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Principles of Synapse Security

Liliam Leme provides an overview of security options in Azure Synapse Analytics:

This blog post will provide an overview of the Synapse security environment focused on Dedicated SQL Pool, Serverless SQL Pool, and Spark.

Security has many layers and frequently it will determine how you build your process. I start this post by reviewing several important security considerations which you can later apply to your Synapse environment. 

This is a fairly lengthy post and it still only covers a moderate amount of what you’d want to do for Azure Synapse Analytics. This is the downside to having a complex interplay of several products: there’s a lot to secure and a lot to think about along the way.

Comments closed

Building an Event Calendar from T-SQL and CSS

Aaron Bertrand builds some HTML:

In my last tip, I showed how to use T-SQL to generate HTML for fancy calendar visuals overlaid with event data from another table. As an extension of that tip, let’s now look at simplifying parts of that query by caching the date information in a calendar table to streamline the outer queries and avoid complications caused by different DATEFIRST settings.

This is the follow-on from a prior post (linked in the lede here), so it would make sense to read that one first if you haven’t already.

Comments closed