Press "Enter" to skip to content

Day: August 24, 2020

Using Flink Stateful Functions

Igal Shilman gives us an example of where Apache Flink’s Stateful Functions come in handy:

In this blog post, we’ll take a look at a class of use cases that is a natural fit for Flink Stateful Functions: monitoring and controlling networks of connected devices (often called the “Internet of Things” (IoT)).

IoT networks are composed of many individual, but interconnected components, which makes getting some kind of high-level insight into the status, problems, or optimization opportunities in these networks not trivial. Each individual device “sees” only its own state, which means that the status of groups of devices, or even the network as a whole, is often a complex aggregation of the individual devices’ state. Diagnosing, controlling, or optimizing these groups of devices thus requires distributed logic that analyzes the “bigger picture” and then acts upon it.

A powerful approach to implement this is using digital twins: each device has a corresponding virtual entity (i.e. the digital twin), which also captures their relationships and interactions. The digital twins track the status of their corresponding devices and send updates to other twins, representing groups (such as geographical regions) of devices. Those, in turn, handle the logic to obtain the network’s aggregated view, or this “bigger picture” we mentioned before.

Read on to see where stateful functions come into play.

Comments closed

Late-Arriving Data with Spark Streaming

Sarfaraz Hussain continues a series on Spark streaming:

The size of the State (discussed in the previous blog) will continue to increase indefinitely as we really don’t know when a bucket can be closed.

But practically a query is not going to receive data 1 week late or in that matter such late-arriving data is of no use to us.

So, to specify the information when to stop considering older buckets for the streaming query we use Watermark.

Read on to see how you can design and implement a watermark.

Comments closed

A Review of Azure Synapse Analytics

Teo Lachev looks at Azure Synapse Analytics:

There is plenty to like in Azure Synapse which is the evaluation of Azure SQL DW. If you’re tasked to implement a cloud-based data warehouse, you have a choice among three Azure SQL Server-based PaaS offerings, including Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse. In a nutshell, Azure SQL Database and Azure SQL MI are optimized for OLTP workloads. For example, they have full logging enabled and replicate each transaction across replicas. Full logging is usually a no-no for decent size DW workloads because of the massive ETL changes involved.

In addition, to achieve good performance, you’ll find yourself moving up the performance tiers and toward the price point of the lower Azure Synapse SKUs. Not to mention that unlike Azure SQL Database, Azure Synapse can be paused, such as when reports hit a semantic layer instead of DW, and this may offer additional cost cutting options.

Teo focuses primarily on SQL Pools and the more SQL-friendly side of things (ELT and Power BI) rather than Spark pools.

Comments closed

Finding Heaps in a Database

David Fowler has a quick one for us:

This is just going to be a quick post but I was asked the question the other day, how can I find all the tables in the database that don’t have a clustered index?

Tables without clustered indexes, known as heaps can give you quite a few different performance problems for various reasons. I don’t really want to go into those here, that could be a subject for another post but finding these heaps is actually very simple.

And David has a short script to show just how simple it is.

Comments closed

Generating Stored Procedure Mappings for Entity Framework Core

Erik Ejlskov Jensen takes us through stored procedure mapping with Entity Framework Core Power Tools:

In my previous post I showed how you can map and use stored procedures manually from EF Core, a process which involved quite a bit of code, and some changes to your derived DbContext.

With the latest release of EF Core Power Tools, you can opt-in to have any SQL Server stored procedures in your database made available to you.

Click through to learn how to do this.

Comments closed

Creating KPIs in Power BI

Alberto Ferrari takes us through key performance indicator creation in Power BI Desktop:

Starting from the July 2020 version, Power BI Desktop offers the possibility of using external tools to modify its internal Tabular model. With a tool like Tabular Editor, you can create a KPI directly in Power BI Desktop so that it can be used in any Power BI report and also by using the Analyze in Excel feature. The KPI feature was previously available only in Tabular models created in Analysis Services or Power BI Premium. This introductive article shows you how to create and consume KPIs in Power BI Desktop. A more detailed description of the available KPI graphics and the corresponding state values is the topic for an upcoming article.

Let us see the feature with a practical – though fictitious – example. Say Contoso needs to analyze the Margin % of its products. The yardstick is the overall margin, which is the Margin % over time and products with a tolerance of 2%. The overall margin of Contoso is 53%. Therefore, a category with a Margin % less than 51% is considered bad (red), over 55% is considered good (green), in between 51% and 55% is considered average (yellow). Moreover, Contoso wants to analyze the trend of Margin % compared with the previous year. For example, the margin might be red but Contoso can evaluate which action to take depending on whether it is improving or not over time.

Read on for the demonstration.

Comments closed

Semantic Search in SQL Server

Haroon Ashraf wraps up a series:

Being the final part of the article, it is going to take you to the next level of analyzing word documents stored in Windows folders, managed by File Table, and consumed by Semantic Search.

Additionally, the readers are going to gain more understanding of Semantic Search and how to make it work with MS Word documents for analysis.

This article provides a name-based analysis of the documents with equal attention to both theory and practice.

Click through for the culmination of all of this filestream work.

Comments closed