Press "Enter" to skip to content

Month: September 2020

From Kafka Into Azure Data Explorer

Anagha Khanolkar walks us through a data movement scenario:

Here is an end-to-end, hands-on lab showcasing the connector in action. You can see an overview of the lab below. In our lab example, we’re going to stream the Chicago crimes public dataset to Kafka on Confluent Cloud on Azure using Spark on Azure Databricks. Then, we will use the Kusto connector to stream the data from Kafka to Azure Data Explorer.

There’s also a lab to try this out, though the estimated spend is a bit high.

Comments closed

Fun with Scala

Muskan Gupta has started a series on common mistakes in Scala. Part 1 has to do with matching:

In the example, we are matching against the single case at a time. But, what if we want to match against multiple cases at a time? How will we do that?

The solution to that is using “|”(pipe) operator.

Let’s consider a scenario where you want to check if the user input is a String or an Int. If it is String or Int then it’s a perfect match else not. Now, I’ll give you the options for how can we do this and you should pick the correct option.

Part 2 involves unpacking a tuple:

Now, what if we don’t want to access the variables in the way shown in the above picture?

The other way could be doing tuple unpacking. In this, we bind every value in the tuple with a variable/reference. This is done using Pattern Matching internally.

Now, we will look at a situation and I’ll give you some options so that we can figure out the correct way of doing tuple unpacking

These are short, multiple-choice questions along with the explanation.

Comments closed

VirtualBox Network Configuration for Kubernetes

Praveen Sripati looks at some VirtualBox network settings:

From the feature matrix and the required features, the only options left around the VirtualBox networking are NAT Network and Bridged Networking. The problem with the Bridged networking is that as mentioned above, it always requires connection to the network and switching to a different network changes the IP of the K8S master and breaks down the entire setup. The certificates during the K8S setup are tied to a specific IP and need to generated again each time the IP address of the master changes (1). This is not impossible, but is tedious every time we change the network and the IP address of the master changes. So, the only optimal option left is to use the NAT Network.

Read on for more advice.

Comments closed

A Critique of “Advanced” as a Description of Learning Content

Don Jones lays out the argument for why the term “advanced” doesn’t mean much for learning:

Let me share a little secret of the training industry with you: an “advanced” topic is any topic that you don’t already know.

Don’t argument is that the proper axis is around commonality of usage: most commonly performed to least commonly performed. It’s an interesting argument for sure.

I’m of two minds with the idea, however. I appreciate Don’s example and like the concept of commonality for differentiation. But there are things which are legitimately advanced topics, in that they would be difficult to understand even if they were common. In Don’s query tuning example, an example of something legitimately difficult to understand is the set of rules the query optimizer chose to test for a particular query. Yes, it is very uncommon to need to know this, but it is also difficult to understand if you do need to know, and explaining how and why the query optimizer chose the path and rules that it did requires a fairly deep base of expertise.

In short, I think there’s an endogeneity problem: things can be perceived as difficult because they are unommon (which is my reading of Don’s point), but also things can be uncommon because they are difficult to understand given some baseline of knowledge.

Comments closed

Avoid sys.sp_columns

Aaron Bertrand went down a rabbit hole:

I spent 20 minutes the other night thinking about possible ways to remove an ntext column from a 28TB table, and I did all of this thinking for nothing.

I was investigating the table for other reasons. And because I’m lazy, I scanned the output of the system procedure sys.sp_columns (instead of writing a proper query against more modern metadata). I spotted the legacy type in the output, and dropped everything. 

I’m pretty sure I’ve never seen sp_columns before. For everyone in the same boat, continue dutifully avoiding it.

Comments closed

DAX Financial Functions

Bill Pearson begins a series on financial functions inside DAX:

As a part of this introduction, you’ll have an opportunity to examine how each function can be employed to support business requirements of the sort that your hypothetical colleagues encounter routinely, and, for the most part, accomplish with Microsoft Excel, in meeting regular business requirements. You’ll learn the purpose of each function, and then undertake a practice example with each that demonstrates how it interacts with a small loan data set, via a calculation that you construct. Moreover, you will:

– Examine the syntax involved in exploiting the function.
– Undertake an illustrative example of the use of the function in a practice exercise.
– Review a brief discussion of the results you obtain in the steps of the practice example.

This is a thorough opening article.

Comments closed

Power BI Report Performance: Number of Visuals on a Page Edition

Chris Webb looks at some extreme scenarios:

You may have read the title of this post and guessed that I’m going to talk about reducing the number of visuals that display data from your dataset as a way of improving performance, but that’s not the case. In this blog post I want to show how visuals that do not display any data from your dataset can have a significant impact on report performance. Before we carry on I suggest you read the series of posts I wrote late last year on measuring the performance of reports in the browser using Chrome/Edge DevTools (part 1 is here, part 2 is here, part 3 is here) because I’ll be using techniques described in these posts in my testing.

Click through for an interesting demo.

Comments closed

Baselining SQL Server with Grafana

Ajay Dwivedi has an interesting project:

Are you tired of not knowing what is usual workload of your server? Do you want to know what is speed of your server?

If you are a developer or DBA who manages Microsoft SQL Servers, it becomes important to understand the current load vs usual load when SQL Server is slow.

Recently, for this reason, I have been working on my own automation to capture performance counters that would help me get an idea of CPU/Memory/IO/Network issues. Baselining also lets me know if the Server load has increased gradually over time.

I have created GitHub repository ‘SqlServer-Baselining-Grafana’ that contains all the scripts that will help anyone to set up the baseline on individual SQL Server instances, and then visualize the collected data using Grafana through one Inventory server with Linked Server for individual SQL Server instances.

Baselining is one of those concepts we often talk about but just as often have trouble implementing.

Comments closed

Correlation and Predictive Power Score in Python

Abhinav Choudhary looks at two methods for understanding the relationship between variables:

dataframes while working in python which is supported by the pandas library. Pandas come with a function corr() which can be used in order to find relation amongst the various columns of the data frame. 
Syntax :DataFrame.corr() 
Returns:dataframe with value between -1 and 1 
For details and parameter about the function check out Link 
Let’s try this in action. 

Read on to see how it works, how to visualize results, and where Predictive Power Score can be a better option.

Comments closed