Press "Enter" to skip to content

Category: KQL

From Join to Lookup in KQL on Power BI

Dany Hoter gives us a workaround:

Many users who try  ADX in direct query mode encounter errors right away.

The errors  complain about lack of memory.

 If the tables are small enough, it may work but still performance will not be as advertised on TV.

The reason in most cases is the behavior of joins in ADX as they are created by PBI.

In this article I’ll show different approaches to joining tables as used by PBI for related tables or as can be expressed in KQL in general.

I created a special table in the help cluster with 31 million rows that is big enough to demonstrate the performance differences between the variations.

Read the whole thing. This one’s a little surprising to me.

Comments closed

Row Rank Min in KQL

Robert Cain continues a series on KQL:

In this post we’ll cover the next in our series on KQL Windowing Functions, row_rank_min. The row_rank_min function will assign rank numbers to an incoming dataset, with the rank of 1 being assigned to the minimal value in the dataset and increasing as the values increase.

First though, if you’ve not read the introductory post on Windowing Functions, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these Windowing Functions work.

Read on for Robert’s explanation.

Comments closed

Cumulative Sums in KQL

Robert Cain continues a series on KQL:

Seeing cumulative totals on a row by row basis is a common need. Think of your bank account, when you look at your ledger it shows each individual transaction. Beside each individual transaction is the current amount in your account, as offset by the amount of the current transaction.

This technique is known as cumulative sums. The Kusto Query Language provides the ability to create cumulative sums using the Windowing Function, row_cumsum. In this post we’ll see how to implement it.

Read on to see how it works and what the results look like.

Comments closed

Tracking Power BI Import Throughput Variance

Chris Webb continues a series on using Log Analytics with Power BI:

In the second post in this series I discussed a KQL query that can be used to analyse Power BI refresh throughput at the partition level. However, if you remember back to the first post in this series, it’s actually possible to get much more detailed information on throughput by looking at the ProgressReportCurrent event, which fires once for every 10000 rows read during partition refresh.

Here’s yet another mammoth KQL query that you can use to analyse the ProgressReportCurrent event data:

Click through for the KQL query, an explanation of how it works, and some practical examples.

Comments closed

Near-Real Time Reports with Power BI and KQL

Dany Hoter can’t wait:

Real time and near real time are subjective terms.

For some businesses, real time is up to 1 ms latency and for other cases 10 minutes latency is considered close to real time.

Lately I encountered a few cases in which Page refresh in PBI was used on a dataset using Direct Query against Kusto/ADX/RTA.

In this article I’ll cover a few best practices to ensure that such an implementation will be successful and conserve on resources.

I’m a bit of a stickler for the term “real-time” so I appreciate Dany’s preface here. The way I learned real-time versus online (versus batch) is, online is when you expect a result quickly but real-time is something you’d put in a fighter plane. And it turns out that, when you explain what the bill will look like, very few companies have the need for true real-time results.

Comments closed

Prev and Next in KQL

Robert Cain lags and leads:

In this post we’ll continue our series on Kusto’s Windowing Functions by covering prev and next. If you’ve not read the introductory post, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these functions are used.

So what do prev and next do? They allow you to retrieve a value in a column from a previous row, or the next (or upcoming) row. This can be very useful in many situations. For instance, calculating the time between two rows based on a datetime column, or the change in a value from one row to the next.

As always, read on for a series of examples from Robert.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

Tracking High-Level Power BI Import Throughput Stats

Chris Webb collects some measurements:

In the first post in this series I described the events in Log Analytics that can be used to understand throughput – the speed that Power BI can read from your dataset when importing data from it – during refresh. While the individual events are easy to understand when you look at a simple example they don’t make it easy to analyse the data in the real world, so here’s a KQL query that takes all the data from all these events and gives you one row per partition per refresh:

Click through for the KQL script, as well as what it all means and what you can get out of it.

Comments closed

Window Functions and Serialization in KQL

Robert Cain tries out some window functions:

The Kusto Query Language includes a set of functions collectively known as Window Functions. These special functions allow you to take a row and put it in context of the entire dataset. For example, creating row numbers, getting a value from the previous row, or maybe the next row.

In order for Window Functions to work, the dataset must be serialized. In this post we’ll cover what serialization is and how to create serialized datasets. This is a foundational post, as we’ll be referring back to it in future posts that will cover some of the KQL Windowing Functions.

Read on to see how to serialize data, what the risks of serialization are, and then how to generate a row number in KQL.

Comments closed

Analyzing Power BI Refresh Performance with Log Analytics

Chris Webb starts a new series:

If you’re tuning refresh for a Power BI Import mode dataset one of the areas you’ll be most interested in is throughput, that is to say how quickly Power BI can read data from the data source. It can be affected by a number of different factors: how quickly the data source can return data; network latency; the efficiency of the connector you’re using; any transformations in Power Query; the number of columns in the data and their data types; the amount of other objects in the same Power BI dataset being refreshed in parallel; and so on. How do you know if any or all of these factors is a problem for you? It’s a subject that has always interested me and now that Log Analytics for Power BI datasets is GA we have a powerful tool to analyse the data, so I thought I’d do some testing and write up my findings in a series of blog posts.

In the first post, Chris gives us an overview of information available and provides one way to query it.

Comments closed