Press "Enter" to skip to content

Day: October 19, 2021

Session Windows in Spark Structured Streaming

Jungtaek Lim, et al, announce support for session windows in Spark Structured Streaming:

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. An input can only be bound to a single window.

Sliding windows are similar to the tumbling windows from the point of being “fixed-sized”, but windows can overlap if the duration of the slide is smaller than the duration of the window, and in this case, an input can be bound to the multiple windows.

Session windows have a different characteristic compared to the previous two types. Session window has a dynamic size of the window length, depending on the inputs. A session window starts with an input and expands itself if the following input has been received within the gap duration. A session window closes when there’s no input received within the gap duration after receiving the latest input. This enables you to group events until there are no new events for a specified time duration (inactivity).

Click through for more details. You could implement session windows when querying existing data using a gaps and islands approach (where you increment the island count when you have a lagged difference greater than the cutoff point), but for streaming scenarios, it’s very handy to have this as a native window type.

Comments closed

De-Cluttering a LIne Chart

Mike Cisneros trims the fat, and there’s a lot of it:

My colleague Alex was the first person I heard refer to the underlying structure of a graph as its “skeleton.” This includes all the features that give form and shape to your graph, but that ideally—like bones—we never even really notice or see. The skeleton frames and structures our graph, but doesn’t include the data itself. 

If we don’t take the time to organize the skeleton of a chart thoughtfully and intentionally, our data won’t be appropriately supported. To an audience, something will always seem a little bit amiss.

Click through for a Halloween-themed visual update.

Comments closed

Upcoming Power BI Improvements

Matt Allington looks at what’s soon-to-be-new in Power BI:

You may already be aware (but then again, maybe not) that Microsoft shares its plans for upcoming new features for the Power Platform every 6 months, and Power BI forms a subset of that plan. The next plan is called Power Platform Release Plan 2021 Wave 2. It takes a lot of planning to continuously improve software, keeping the current state working while adding new features. Part of this planning process is that Microsoft shares the big picture of what is coming.

I was reviewing the next release plan this week and wanted to share a couple of things that are coming that look exciting to me (as a user/developer of Power BI Reports). The 5 new features I love, and want to call out are listed below (all are pro features except the last one):

Click through for the list.

Comments closed

Getting Power BI Dataset Information without Takeover

Marc Lelijveld just wants to peek at some Power BI Dataset details:

We have all been in a situation where you want to get more insights in the specific settings for a published dataset. But when this dataset is owned and published by someone else, you have to take-over the dataset first to get all the details available. In this blog, I will elaborate on what is available to you in the service and share some code snippets that help you to explore other dataset settings like refresh schedules and configured parameter values.

Read on to see what kinds of details Marc means, as well as a way to do it without taking ownership of the Dataset.

Comments closed

Deferred Name Resolution and Its Discontents

Kendra Little gives us a detailed rundown of deferred name resolution in SQL Server:

Have you ever tried to create an object in SQL Server, but it failed due to a missing table, column, or other dependency? If so, you’ve hit a case where SQL Server doesn’t offer ‘deferred name resolution’.

While these errors can be helpful when you’ve made a typo or accidentally used the wrong database, this can sometimes be a big hassle when you are…

– Deploying database code to set up a partial environment

– Deploying database code from version control to an empty database to ensure the code is valid

In this post, I walk through common scenarios and test whether deferred name resolution works or not.

Check it out for what is probably the most detailed look at the topic I’ve ever seen.

Comments closed

Explaining SSISDB

Andy Mallon explains why SSISDB exists:

Back in my day, we stored SQL Server Integration Services (SSIS) packages in msdb, or the file system. Well, I’m neither dead nor retired, so that’s not an accurate statement–today is still “my time.” But when I was learning about SQL Server, those were the only two options. If you or your environment predate SQL Server 2012, then you might only be familiar with those two options.

In SQL Server 2012, Microsoft introduced the SSISDB. Since SQL Server Integration Services (SSIS) is its own service, it deserves its own database. In a service-oriented world, it is discouraged to put data for one service (the database engine) in the same database as another service (integration services).

Click through to see where it comes from and when you might need it.

Comments closed