Press "Enter" to skip to content

Month: June 2020

Higher-Order Functions in Scala

Rahul Agarwal explains how higher-order functions make your life easier:

As a part of the functional programming paradigm, whatever logic we need to write is to be implemented in terms of pure and immutable functions. Here, functions take arguments from other functions as input and return values/functions which used by other functions for further processing. Here, pure means that the function does not produce any side-effects like printing to the console and immutable means that the function takes in and produces immutable data(val) only.

Higher-order functions comply with the above idea. As compared to for loops, we can iterate a data structure using higher-order functions with much less code.

The term “higher-order function” can sound a bit overwhelming if you’re completely unfamiliar, but it’s a pretty simple concept: a function which takes another function as (at least) one of its inputs. As Rahul points out, this is quite the useful concept.

Comments closed

Creating Data-Driven Power BI Report Subscriptions

John White shows how to create a data-driven subscription for a Power BI report:

One of the features that has never made the leap from SQL Server Reporting Services (SSRS) on-premises to the cloud is data-driven subscriptions. Users can subscribe to reports, but a data-driven subscription allows individual subscriptions to be stored in a central location and parameterized, while delivering the reports to multiple locations. This article will describe a pattern for accomplishing this using SharePoint lists as the subscription store, and Power Automate as the automation tool, for a no-code solution to this requirement.

The other alternative would be to use Power BI Report Server, but if you’re not using that, this is an interesting approach and solution.

Comments closed

Optimizing Derived Table Expressions

Itzik Ben-Gan continues a series on table expressions:

As mentioned, next month I’ll get to the details of unnesting of derived tables. For now, suffice to say that SQL Server normally does apply an unnesting/inlining process to derived tables, where it substitutes the nested queries with a query against the underlying base tables. Well, I’m oversimplifying a bit. It’s not like SQL Server literally converts the original T-SQL query string with the derived tables to a new query string without those; rather SQL Server applies transformations to an internal logical tree of operators, and the outcome is that effectively the derived tables typically get unnested. When you look at an execution plan for a query involving derived tables, you don’t see any mention of those because for most optimization purposes they don’t exist. You see access to the physical structures that hold the data for the underlying base tables (heap, B-tree rowstore indexes and columnstore indexes for disk-based tables and tree and hash indexes for memory optimized tables).

This article deserves a careful reading.

Comments closed

The Function of Service Broker Queues

Chris Johnson continues a series on Service Broker:

A queue is a full database object, like a table or a stored procedure. As such, it is part of a schema, and appears in the sys.objects view. A queue holds messages that have been sent to it, in the same way that a table does, and these messages can even be queried in the same way that you would query a table.

You can’t change the columns that are available, and there are quite a few of them. To see what there is, just run SELECT * against any queue, but a few of the key ones are service_name, service_contract_name, message_type_name, message_body, message_enqueue_time, conversation_handle.

Read on to see how to create a new queue.

Comments closed

A Power BI FAQ

James Serra answers questions about Power BI:

Should we have dev, test, and prod workspaces?

Yes! You should use change management to move reports through the dev/test/prod workspace tiers via the new deployment pipelines in Power BI. Use the workspaces to collaborate on Power BI content with your colleagues, while distributing the report to a larger audience by publishing an app. You should also promote and certify your datasets. The reports and datasets should have repeatable test criteria.

Read on for the full set of questions and answers.

Comments closed

Including Headers in Zero-Row ADF Data Flows

Mark Kromer meets a challenge:

Today, we don’t have an option in data flows in ADF to include headers when your transformations result in zero rows. But you can build the logic to handle this scenario. So, until we add a checkbox feature to include headers, you can use this technique below to achieve this.

Click through for the explanation, as well as a completed version you can take for your own.

Comments closed

Improving Power BI Performance

Dan Szepesi continues a series on Power BI performance tuning:

As an example, I am going to go through in detail how to use the results from the Performance Analyzer to understand the performance of your visuals.  I downloaded the sample PBIX from the Power BI Documentation at Microsoft.com – https://docs.microsoft.com/en-us/power-bi/create-reports/sample-datasets and I will use the visuals from the Net Sales report in the screenshots that follow.

I am going to walk through how I would approach looking at the performance of this visuals on this report and show what we can learn from the data that the Performance Analyzer gives me.

Click through for that example as well as several helpful tips.

Comments closed

Vectorized R I/O in Apache Spark 3.0

Hyukjin Kwon gives us a preview of SparkR improvements in Apache Spark 3.0:

When SparkR does not require interaction with the R process, the performance is virtually identical to other language APIs such as Scala, Java and Python. However, significant performance degradation happens when SparkR jobs interact with native R functions or data types.

Databricks Runtime introduced vectorization in SparkR to improve the performance of data I/O between Spark and R. We are excited to announce that using the R APIs from Apache Arrow 0.15.1, the vectorization is now available in the upcoming Apache Spark 3.0 with the substantial performance improvements.

This blog post outlines Spark and R interaction inside SparkR, the current native implementation and the vectorized implementation in SparkR with benchmark results.

Certain operations get ridiculously faster with this change.

Comments closed

Troubleshooting Kafka Remote Connections

Robin Moffatt explains common errors people run into when trying to connect to remote Kafka clusters:

In this example, my client is running on my laptop, connecting to Kafka running on another machine on my LAN called asgard03:

The initial connection succeeds. But note that the BrokerMetadata we get back shows that there is one broker, with a hostname of localhostThat means that our client is going to be using localhost to try to connect to a broker when producing and consuming messages. That’s bad news, because on our client machine, there is no Kafka broker at localhost (or if there happened to be, some really weird things would probably happen).

As usual, things boil down to “Configure it correctly and it works.”

Comments closed

Kafka + Kotlin

Unni Mana shows how to create a Kafka consumer and producer in the Kotlin language:

We are using KafkaTemplate to send the message to a topic called test_topic. This will return a ListenableFuture object from which we can get the result of this action. This approach is the easiest one if  you just want to send a message to a topic.

Generally, when we talk about the Hadoop ecosystem and functional programming languages on the Java Virtual Machine, we think Scala. But this is an example showing that Kotlin is in that discussion too.

Comments closed