Kevin Feasel – Page 795

The email included a screen shot from the lineage view of a Power BI workspace, some context about working to troubleshoot a problem, and the question “We want to refresh this dataflow, and not have it refresh the downstream dataflows. Is this possible?”
I almost said no, and then I remembered this post and realized the answer was “yes, sort of.”

Click through to see how it all fits together. And I’m in favor of buying Matthew a sword—can’t have too many of those.

Comments closed

The Session Window in Flink

Published 2020-09-22 by Kevin Feasel

Kundan Kumarr continues a series on windows in Apache Flink:

In the real world, all the work that we do online- Visiting a website, Clicking around the website, do online transactions, and so on are in sessions. We might just go to an e-commerce website like amazon, looking for products, clicking around for a bit, and then stop. All is done within a session. There is a use case where these websites may want to track pages that we visited in a single session. For that, it needs to group all clicks together which are streaming in, based on a session. These streaming use cases can be implemented easily by Flink Session window.
The Session windows assigner groups elements by sessions of activity. Session windows do not overlap and do not have a fixed start and end time. The number of entities within a session window is not fixed. Because it is a user who defines typically how long the session would be. A session window closes when it does not receive elements for a certain period of time, i.e., when a gap of inactivity occurred. For example, once we have been idle on the amazon website let say for 1 minute that is the end of the previous session and if go back to the site after 1 sec it will start a new session. The way it would determine the session is the pause between one click and another click.

Click through for a depiction and an example.

Comments closed

The Problem with VM Backups of SQL Server

Published 2020-09-22 by Kevin Feasel

Sean Gallardy turns a problem on its head:

Now let’s get to the main point, which is how long the VM stays paused or stunned – remember, this is a “small” or “short” amount of time, one might even say “trivial”. When it is kept this short to where it’s “trivial” as in less than a second then all is good and you most likely won’t notice it except in very high workloads… but we should be running with VSS integration and not VM level so it’s still incorrect, but hey. When this time is not short of trivial then GOOD things start to happen, most notably that high availability kicks in.

I appreciate the framing of this post, as the failover wasn’t a problem; it merely exposes the actual problem.

Comments closed

Space Savings from Separate Date and Time Columns in Power BI

Published 2020-09-22 by Kevin Feasel

Shabnam Watson runs an experiment:

As you may have already heard, one of the easiest ways to reduce a Power BI model (dataset) size is by splitting DateTime columns into separate Date and Time columns but the question is how much space reduction can you achieve by doing so. As I show in this blog post, the reduction can be significant and up to % 80 or % 90 depending on the number and cardinality of the datetime columns.

That’s a lot of savings.

Comments closed

A Postgres Version of WhoIsActive

Published 2020-09-22 by Kevin Feasel

Josh Simar is on a mission:

while ago, I asked in the twitterverse if there was a Postgres equivalent for the great sp_WhoIsActive script. While I didn’t get a flat-out no (which I wasn’t expecting) I didn’t get anyone pointing me in the direction of something pre-done and did get some advice that I should take it on.
Well it took quite a while and it’s still nowhere near as robust as the MSSQL version but as a first stab I have created the pg_WhoIsActive function.
While doing it I basically said that I want a 1 to 1 equivalent as much as possible but to get it out quick I had one major rule for the POC.

Check it out.

Comments closed

T-SQL Tuesday 130 Roundup: Automate Your Stress Away

Published 2020-09-22 by Kevin Feasel

Elizabeth Noble recaps T-SQL Tuesday #130:

I’m really grateful for all the bloggers that took part this month. Especially since automation has been a topic discussed before. However, it’s hard for me to get too much of my day to day work automated. And I was really looking forward to these topics so that I could learn new tasks I could automate myself. With that said, let’s see all the wonderful ideas people contributed this month. And if you’re like me, you’re going to want to put some of this automation in place as soon as possible.

Click through for 24 entries.

Comments closed

From Kafka Into Azure Data Explorer

Published 2020-09-21 by Kevin Feasel

Anagha Khanolkar walks us through a data movement scenario:

Here is an end-to-end, hands-on lab showcasing the connector in action. You can see an overview of the lab below. In our lab example, we’re going to stream the Chicago crimes public dataset to Kafka on Confluent Cloud on Azure using Spark on Azure Databricks. Then, we will use the Kusto connector to stream the data from Kafka to Azure Data Explorer.

There’s also a lab to try this out, though the estimated spend is a bit high.

Comments closed

Fun with Scala

Published 2020-09-21 by Kevin Feasel

Muskan Gupta has started a series on common mistakes in Scala. Part 1 has to do with matching:

In the example, we are matching against the single case at a time. But, what if we want to match against multiple cases at a time? How will we do that?
The solution to that is using “|”(pipe) operator.
Let’s consider a scenario where you want to check if the user input is a String or an Int. If it is String or Int then it’s a perfect match else not. Now, I’ll give you the options for how can we do this and you should pick the correct option.

Part 2 involves unpacking a tuple:

Now, what if we don’t want to access the variables in the way shown in the above picture?
The other way could be doing tuple unpacking. In this, we bind every value in the tuple with a variable/reference. This is done using Pattern Matching internally.
Now, we will look at a situation and I’ll give you some options so that we can figure out the correct way of doing tuple unpacking.

These are short, multiple-choice questions along with the explanation.

Comments closed

A Critique of “Advanced” as a Description of Learning Content

Published 2020-09-21 by Kevin Feasel

Don Jones lays out the argument for why the term “advanced” doesn’t mean much for learning:

Let me share a little secret of the training industry with you: an “advanced” topic is any topic that you don’t already know.

Don’t argument is that the proper axis is around commonality of usage: most commonly performed to least commonly performed. It’s an interesting argument for sure.

I’m of two minds with the idea, however. I appreciate Don’s example and like the concept of commonality for differentiation. But there are things which are legitimately advanced topics, in that they would be difficult to understand even if they were common. In Don’s query tuning example, an example of something legitimately difficult to understand is the set of rules the query optimizer chose to test for a particular query. Yes, it is very uncommon to need to know this, but it is also difficult to understand if you do need to know, and explaining how and why the query optimizer chose the path and rules that it did requires a fairly deep base of expertise.

In short, I think there’s an endogeneity problem: things can be perceived as difficult because they are unommon (which is my reading of Don’s point), but also things can be uncommon because they are difficult to understand given some baseline of knowledge.

Comments closed

VirtualBox Network Configuration for Kubernetes

Published 2020-09-21 by Kevin Feasel

Praveen Sripati looks at some VirtualBox network settings:

From the feature matrix and the required features, the only options left around the VirtualBox networking are NAT Network and Bridged Networking. The problem with the Bridged networking is that as mentioned above, it always requires connection to the network and switching to a different network changes the IP of the K8S master and breaks down the entire setup. The certificates during the K8S setup are tied to a specific IP and need to generated again each time the IP address of the master changes (1). This is not impossible, but is tedious every time we change the network and the IP address of the master changes. So, the only optimal option left is to use the NAT Network.

Read on for more advice.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

Refreshing a Power BI Dataflow without Refreshing Downstream Dataflows

The Session Window in Flink

The Problem with VM Backups of SQL Server

Space Savings from Separate Date and Time Columns in Power BI

A Postgres Version of WhoIsActive

T-SQL Tuesday 130 Roundup: Automate Your Stress Away

From Kafka Into Azure Data Explorer

Fun with Scala

A Critique of “Advanced” as a Description of Learning Content

VirtualBox Network Configuration for Kubernetes