Press "Enter" to skip to content

Author: Kevin Feasel

Apache Doris and Data Colocation

Frank Z takes us through a use case for Apache Doris:

In data analytics, fast query performance is more of a result than a guarantee. What’s more important than the result itself is the architectural design and mechanism that enables quick performance. This is exactly what this post is about. I will put you into context with a typical use case of Apache Doris, an open-source MPP-based analytic database.

The user, in this case, is an all-category Q&A website. As a billion-dollar listed company, they have their own data management platform. What Doris does is support the data filtering, packaging, analyzing, and monitoring workloads of that platform. Based on their huge data size, the user demands quick data loading and quick response to queries. 

This sounds a lot like sharding of the data, where you segregate data for a particular customer/entity into its own database (and possibly instance), with the exception that queries are expected to go over a number of shards rather than focus on a single one.

Comments closed

Running a Background Job in Powershell

Patrick Gruenauer does two things at once:

In this blog post, I’d like to give you a few examples related to PowerShell Background Jobs to build upon. Let’s jump in.

Let’s say I want to ping a few computers. This consumes time. So I want that this task runs in the background as a PowerShell background job.

Outside of practicing for a certification, I don’t remember the last time I willingly chose to run something as a background job, either in Powershell or bash. The concept is still useful (especially if I’m on an SSH connection or have direct terminal access), though in a UI-driven world, I’d just open a new terminal tab.

Comments closed

Balancing Governance and Collaboration with Fabric

Marc Lelijveld makes it sound like I can’t just say “No!” to everything as a Microsfot Fabric administrator:

Frequently, I am approached by curious individuals who inquire about my job and how I contribute to the success of our customers, especially since I am not directly involved in building solutions for each and every one of them. These questions have made me realize that it might be interesting to share insights into my role as a Fabric Administrator, or as some may refer to it, a Power BI Administrator.

In this blog post, I aim to shed light on the essence of daily activities of a Fabric Administrator, the meaningful conversations people in this role engage in, and the additional value they bring to the table.

Read on to see what people like Marc do all day.

Comments closed

Query Hints: Ad Hoc vs Query Store

Grant Fritchey sets up a showdown:

I recently presented a session on the Query Store at Data Saturday Rhineland and the question came up: If there’s already a query hint on a query, what happens when you try to force a similar query hint?

Yeah, OK, that is a weird one. I don’t know the answer, but I’m about to find out.

Click through for a very interesting demo. To be honest, I expected the opposite result, so this was surprising.

Comments closed

Dynamic Data Masking and Formatted Text

Ben Johnston attacks masked strings in different formats:

Clearly this is NOT a suggestion for how you might break the text but is far more of an exercise to show you how a bad actor may attempt to look at your data in ways that would generally not cause red flags.

It is especially important to reinforce the sentiment that Dynamic Data Masking less of a security tool to prevent attacks, but more to hide data from general viewing, and as a tool for building applications where the data still is accessible in some scenarios and not others.

Click through for several examples. As I like to say (over and over), dynamic data masking only works until users get access to write arbitrary queries against a system. If they’re accessing data through an app or via stored procedure calls only, then it be a reasonable part of a broader security posture.

Comments closed

Decluttering a Dual-Axis Chart

Amy Esselman only needs one Y axis:

You may be confused and overwhelmed at first. Dual-axis graphs like this are inherently challenging. Whether you call them dual-axis graphs, combo charts, or secondary y-axis graphs, they always demand extra effort from a reader to figure out which data series to read against which vertical axis. 

Click through for a variety of ways to improve a busy dual-axis chart.

Comments closed

Preliminary Thoughts on Microsoft Fabric in Preview

Reitse Eskens shares some initial thoughts:

So, these preliminary opinions I’m offering now are based on the preview I’ve worked with and will keep on working with.

That’s the first observation, I’ll keep on working with this. Why? To be honest I think it’s a step forward from the Data Factory, Synapse, PowerBI experience. Everything together in one product makes life easier. Even though I’m having a really hard time adopting to the interface. I keep selecting the wrong buttons to get stuff done. Then again, only being able to do this after working hours and during the weekend may have something to do with that. But making the interface a little more intuitive would really help me.

Read on for what Reitse has to share so far.

Comments closed

Generating Random Data in Snowflake

Kevin Wilkie generates some random data:

One of the many things that the business team asks me to do is to create random-ish data. Thankfully, in Snowflake, there are many ways to make this happen. Today, I want to go thru just a few of them.

Perhaps the one that most people are familiar with is making Snowflake create a random number.

Click through for initial coverage of the RANDOM() function, as well as how you can generate data across a uniform distribution over a given range.

Comments closed

When Statistics Updates Happen

Matthew McGiffen gives us the numbers:

SQL Server has had the ability to automatically update statistics since version 7.0. Nonetheless for a long part of my career working with SQL Server, whenever a performance issue raised its head everyone’s knee-jerk response would be “Update Statistics!” In most cases though the people shouting that didn’t really understand what the “Statistics” were, or what mechanisms might already be in place for keeping them up to date.

Of course SQL Server isn’t perfect and sometimes it is helpful for human intelligence to intervene. But to provide intelligent intervention one has to understand how things work.

Read on to learn what triggers automatic stats updates in various versions of SQL Server.

Comments closed