Press "Enter" to skip to content

Day: September 15, 2020

The Count Window in Flink

Kundan Kumarr takes us through an example of the count window type in Apache Flink:

In the blog, we learned about Tumbling and Sliding windows which is based on time. In this blog, we are going to learn to define Flink’s windows on other properties i.e Count window. As the name suggests, count window is evaluated when the number of records received, hits the threshold.

Count window set the window size based on how many entities exist within that window. For example, if we fixed the count as 4, every window will have exactly 4 entities. It doesn’t matter whats the size of the window in terms of time. Window size will be different but the number of entities in that window will always be the same. Count windows can have overlapping windows or non-overlapping, both are possible. The count window in Flink is applied to keyed streams means there is already a logical grouping of the stream based on all values associated with a certain key. So the entity count will apply on a per-key basis.

I’m curious if there’s a combination of count + time, triggering when you hit X elements or Y seconds, whichever comes first.

Comments closed

Trying Out Redis

Paul Brebner walks us through some of the basics of Redis:

It took me sometime to work out what Redis really isn’t, and is!

The Redis documentation says what it is not:

“Redis is not a plain key-value store…”

And what it is:

“It is actually a data structures server, supporting different kinds of values.”

So, it (really) is a fast in-memory key-value store, where keys are always strings, but the value can actually be a number of different data types, with different operations supported on each data type. It’s also distributed (using the cluster mode, and supports replication). And it’s got two types of disk persistence (which makes it more like a database), and a caching mode. See the FAQ for more details.

Redis can be extremely valuable as a cache, though persistent Redis can introduce weird problems at scale.

Comments closed

Using an Azure VM’s D Drive for tempdb

William Assaf shows how you can use the temporary D drive on an Azure VM to host tempdb in SQL Server:

Moving your SQL Server instance’s TempDB files to the D: volume is recommended for performance, as long as the TempDB files fit it the D: that has been allocated, based on your VM size. 
When the D: is lost due to deallocation, as expected, the subfolder you created for the TempDB files (if applicable) and the NTFS permissions granting SQL Server permission to the folder are no longer present. SQL Server will be unable to create the TempDB files in the subfolder and will not start. Even if you put the TempDB data and log files in the root of D:, after deallocation, that’s still not a solution, as the NTFS permissions to the root of D: won’t exist. In either case, SQL Server will be unable to create the TempDB files and will not start.

Read on for a few options and William’s thoughts on the relative merits of each.

Comments closed

MAX Versus LASTDATE in DAX

Alberto Ferrari explains why you might want to use MAX() instead of LASTDATE() to find the latest date in a table with DAX:

Many DAX newbies use LASTDATE to search for the last date in a time period. Or they use NEXTDAY to retrieve the day after a given date. Although these functions do what they promise, they are not intended to be used in simple expressions. Instead, they are table functions designed to be used in time intelligence calculations. Using them the wrong way leads to inefficient code. Moreover, using these functions in ways they were not designed for is a telltale sign that the developer still does not grasp certain details of DAX.

In this article, we elaborate on the topic in order to understand what these time intelligence functions do; we also want to understand the reason why it is so easy to confuse them with simple math over dates. We want to elaborate on this topic through examples. Therefore, instead of starting with boring theory, we start by looking at a calculation that – although it works perfectly fine – is inherently wrong.

Read on for that explanation.

Comments closed

Finding Indexes Not in Use

Dennes Torres takes us through a few iterations of a query to find indexes not in use:

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Read on to see Dennes’s query evolve and bring important information to the table. For example, it’s not just how often a particular index gets used; it’s also how important the queries are which use this index. An index may only run once a month, but if it turns the most important report the CEO cares about from running in 4 hours to running in 4 seconds, you bet that index is staying.

Comments closed

Choosing a Power BI Report Type

Paul Turley compares Power BI paginated and analytic reports:

This brings me to the subject of this post: Paginated and Analytic reports.

Before we had Power BI, we had Reporting Services. When the focus of my career and consulting practice became Business Intelligence – fifteen to about five years ago – most of the “BI reports” I created were in SQL Server Reporting Services (SSRS). We could create a variety of charts and reports that allowed users to drill-down from summary to details. We could create reports that allowed a user to drill-through, from one record or report item to another report that presented more details. With a little creativity, we could create reports that allowed users to click items that would filter other items on the same report. It took some work and time to create these interactive “drill-through-to-self” reports. Today, after transforming data into the right format for a tabular data model, this type of interactive functionality just magically happens in Power BI with very little report design effort. But, Power BI is primarily a tool for analyst users to create their own reports and to get answers to business questions rather than for IT developers to create reports that print or export perfectly to a PDF or Excel.

For now, Paul is asking for thoughts and questions with the promise that there will be an update to the post. So stop on by and ask a question or two.

Comments closed

Power BI (Lack of) Performance with Sharepoint

Matt Allington is not impressed:

On the face of it, it seems like a great idea to leverage SharePoint as a storage location for CSV and Excel files.

– Everyone has easy access to the files for editing and storage
– SharePoint manages version control, check in, check out etc
– SharePoint can facilitate shared editing of files
– You can build a Power BI report that will refresh online without the need to install a gateway.

Unfortunately, despite the benefits, the experience is not great.  Power BI performance with SharePoint as a data source is simply terrible.  Ultimately, the problems come down to performance in 2 areas.

Read on to learn more about these two issues and what you can do instead.

Comments closed

Eager Spools in SQL Server

Erik Darling describes the concept of eager spooling:

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

I enjoy and frequently use Erik’s depiction that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index.

Comments closed