Press "Enter" to skip to content

Category: KQL

Creating a Log Analytics Workspace

Gilbert Quevauvilliers begins a new series:

As with most of my blog posts it involves a client from a customer where I am consulting, which I think will help others.

The requirement was to analyse the Power BI Query usage patterns of the users. The initial requirement was to find out how many users were using Excel to gain access to the Power BI Dataset.

I knew that I could get this using Azure Log Analytics. Not only could I find out how many users are using Excel, but I could also find out what queries they are running, how long they took.

Read on for the first part in this series, which details setting up Azure Log Analytics.

Comments closed

Dense Rank in KQL

Robert Cain continues a series on KQL:

The Kusto Windowing Function row_rank_dense is an interesting function. It lets you get a unique count of a specific column in a dataset. Unlike other methods of getting counts, row_rank_dense allows you to see each individual row of data.

First though, if you’ve not read the introductory post on Windowing Functions, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these Windowing Functions work.

Read on to see how this window function works.

Comments closed

From Join to Lookup in KQL on Power BI

Dany Hoter gives us a workaround:

Many users who try  ADX in direct query mode encounter errors right away.

The errors  complain about lack of memory.

 If the tables are small enough, it may work but still performance will not be as advertised on TV.

The reason in most cases is the behavior of joins in ADX as they are created by PBI.

In this article I’ll show different approaches to joining tables as used by PBI for related tables or as can be expressed in KQL in general.

I created a special table in the help cluster with 31 million rows that is big enough to demonstrate the performance differences between the variations.

Read the whole thing. This one’s a little surprising to me.

Comments closed

Row Rank Min in KQL

Robert Cain continues a series on KQL:

In this post we’ll cover the next in our series on KQL Windowing Functions, row_rank_min. The row_rank_min function will assign rank numbers to an incoming dataset, with the rank of 1 being assigned to the minimal value in the dataset and increasing as the values increase.

First though, if you’ve not read the introductory post on Windowing Functions, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these Windowing Functions work.

Read on for Robert’s explanation.

Comments closed

Cumulative Sums in KQL

Robert Cain continues a series on KQL:

Seeing cumulative totals on a row by row basis is a common need. Think of your bank account, when you look at your ledger it shows each individual transaction. Beside each individual transaction is the current amount in your account, as offset by the amount of the current transaction.

This technique is known as cumulative sums. The Kusto Query Language provides the ability to create cumulative sums using the Windowing Function, row_cumsum. In this post we’ll see how to implement it.

Read on to see how it works and what the results look like.

Comments closed

Tracking Power BI Import Throughput Variance

Chris Webb continues a series on using Log Analytics with Power BI:

In the second post in this series I discussed a KQL query that can be used to analyse Power BI refresh throughput at the partition level. However, if you remember back to the first post in this series, it’s actually possible to get much more detailed information on throughput by looking at the ProgressReportCurrent event, which fires once for every 10000 rows read during partition refresh.

Here’s yet another mammoth KQL query that you can use to analyse the ProgressReportCurrent event data:

Click through for the KQL query, an explanation of how it works, and some practical examples.

Comments closed

Near-Real Time Reports with Power BI and KQL

Dany Hoter can’t wait:

Real time and near real time are subjective terms.

For some businesses, real time is up to 1 ms latency and for other cases 10 minutes latency is considered close to real time.

Lately I encountered a few cases in which Page refresh in PBI was used on a dataset using Direct Query against Kusto/ADX/RTA.

In this article I’ll cover a few best practices to ensure that such an implementation will be successful and conserve on resources.

I’m a bit of a stickler for the term “real-time” so I appreciate Dany’s preface here. The way I learned real-time versus online (versus batch) is, online is when you expect a result quickly but real-time is something you’d put in a fighter plane. And it turns out that, when you explain what the bill will look like, very few companies have the need for true real-time results.

Comments closed

Prev and Next in KQL

Robert Cain lags and leads:

In this post we’ll continue our series on Kusto’s Windowing Functions by covering prev and next. If you’ve not read the introductory post, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these functions are used.

So what do prev and next do? They allow you to retrieve a value in a column from a previous row, or the next (or upcoming) row. This can be very useful in many situations. For instance, calculating the time between two rows based on a datetime column, or the change in a value from one row to the next.

As always, read on for a series of examples from Robert.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

Tracking High-Level Power BI Import Throughput Stats

Chris Webb collects some measurements:

In the first post in this series I described the events in Log Analytics that can be used to understand throughput – the speed that Power BI can read from your dataset when importing data from it – during refresh. While the individual events are easy to understand when you look at a simple example they don’t make it easy to analyse the data in the real world, so here’s a KQL query that takes all the data from all these events and gives you one row per partition per refresh:

Click through for the KQL script, as well as what it all means and what you can get out of it.

Comments closed