Press "Enter" to skip to content

Day: July 25, 2023

Prev and Next in KQL

Robert Cain lags and leads:

In this post we’ll continue our series on Kusto’s Windowing Functions by covering prev and next. If you’ve not read the introductory post, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these functions are used.

So what do prev and next do? They allow you to retrieve a value in a column from a previous row, or the next (or upcoming) row. This can be very useful in many situations. For instance, calculating the time between two rows based on a datetime column, or the change in a value from one row to the next.

As always, read on for a series of examples from Robert.

Comments closed

Percentage by Group in R

Steven Sanderson performs a breakdown:

Calculating percentages by group is a common task in data analysis. It allows you to understand the distribution of data within different categories. In this blog post, we’ll walk you through the process of calculating percentages by group using three popular R packages: Base R, dplyr, and data.table. To keep things simple, we will use the well-known Iris dataset.

The Iris dataset contains information about different species of iris flowers and their measurements, including sepal length, sepal width, petal length, and petal width. We will focus on the ‘Species’ column and calculate the percentage of each species in the dataset.

Read on for the three approaches. I think the Tidyverse approach is the easiest to understand in this case, though all three get you to the answer.

Comments closed

Power BI Dataset Data Source Credential Options

Angela Henry puts together a list:

If we are using enterprise-wide datasets, we don’t want Power BI datasets owned by individuals; we want them to be owned by a Service Principle so they aren’t relying on specific individuals when things go sideways (and because we all want to go on vacation at some point). However, it’s not always clear on what credentials will actually be used for our data sources in our datasets when using a Service Principle. In a previous post, I talked about how to set up a service principle to take over a dataset when using data gateways, but one of the pre-requisites I listed was that your data sources needed to be configured with appropriate credentials. That’s where this post comes in.

You essentially have three options for data source credentials, depending on your data source type.

Click through for that list of three options, as well as the trade-offs inherent in each.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

New Script: sp_HealthParser

Erik Darling has a new script for us:

I recently found myself in the midst of a client issue that lead me to need a bunch of corroborating data from the system health extended event session.

There’s hardly any good documentation on it, and even fewer well-written queries or resources for parsing data out of it.

So now I’m making it easy for you, because I care about you more that Microsoft does.

Click through for a link to the script and to get an idea of what it provides.

Comments closed