Press "Enter" to skip to content

Day: August 17, 2020

Stateful Stremaing with Spark

Sarfaraz Hussain continues a series on Spark Streaming:

Structured Streaming does processing under the hood as micro-batches (default nature).

state is versioned between micro-batches while the streaming query runs. So as the series of incremental execution plans are generated (discussed in Part 2), each execution plan knows what version of the state it needs to read from.

Each micro-batch reads the previous version of the state data i.e. the previous running count then updates it and creates a new version. Each of these versions gets check-pointed into the same check-point location that we have provided in the query.

Read on to understand the implications of this and what it allows you to do.

Comments closed

Multi-Threaded Message Consuption with Kafka

Igor Buzatovic takes us through a fairly advanced topic in Apache Kafka:

If you are familiar with basic Kafka concepts, you know that you can parallelize message consumption by simply adding more consumers in the same group. However, that approach is more suitable for horizontal scaling where you add new consumers by adding new application nodes (containers, VMs, and even bare metal instances).

A multi-consumer approach can also be used for vertical scaling, but this requires additional management of consumer instances and accompanying consuming threads in the application code. Using multiple consumer instances introduces additional network traffic as well as more work for the consumer group coordinator since it has to manage more consumers.

While these concerns may not be strong enough reasons for switching from a thread per consumer to a multi-threaded model, there are use cases in which a multi-threaded model has compelling advantages.

Read the whole thing.

Comments closed

Killing Idle Analysis Services Sessions

Shabnam Watson shows us how to kill idle SQL Server Analysis Services sessions:

Think of this method as an emergency procedure only. As always, have database backups and try this on a development server first. Always take a backup of msmdsrv.ini before you modify any server properties. The default location of the file is this: C:\Program Files\Microsoft SQL Server\MSAS15.MSSQLSERVER\OLAP\Config

If you set the timeout values too low on server that is under resource pressure, you may not be able to get to the server properties using SSMS and change them quickly within the time you set for the timeout. For this reason, I prefer the user of XMLA in this case which makes the process faster.

Read on to see how to do this.

Comments closed

Getting Started with Jupyter Notebooks

Aveek Das takes us through the most popular name in notebooks:

In this article, I am going to explain what Jupyter Notebooks are and how to install the same on your machine. Further, I will demonstrate how to use these notebooks using Visual Studio Code and perform data analysis and other development activities. It is an open-source platform using which you can create and share documents that contain live code, equations, and visualizations along with the formatted text. Most importantly, these notebooks can be run on the web browser by just starting a server and using it. This open-source project is maintained by the team at Project Jupyter.

This is a fairly basic introduction to the topic, good if you have heard about notebooks but don’t know where to begin.

Comments closed

Working with Central Management Servers in dbatools

Mikey Bronowski continues a series on dbatools:

The built-in feature of the SSMS allows us to configure a group of SQL instances and run queries against multiple instances at once. With the registered servers you can also build a list of SQL Servers in one place, so everyone with access to the CMS can see them. First, we will start by creating registered servers and server groups.

This is an underrated set of functionality for SQL Server and dbatools does a good job working with it.

Comments closed

Preparing Source Data in Power BI

Paul Turley continues a series on doing Power BI the right way:

Your options for transforming data at the source depend on the capabilities of the data source. In general, a query running closer to a data source will be more efficient than one that must cache a set of data and manipulate it downstream. If you are importing text files or Excel files, then your only choice is to import an entire file and then throw away what you don’t need – which is fine for small-ish data sets. On the right side of the previous diagram, some data morphing can be performed in DAX using calculated columns or calculated tables. In rare cases, there may be good reasons but if you can perform row-level calculations in a view or in Power Query, you will be better off to keep all that query logic in the same layer of the solution.

Read on for some good advice, as well as information on query folding, filters, incremental refresh, and much more.

Comments closed

Geometry and Geography Functions in Power BI

Chris Webb walks us through some new Power Query functionality:

In the August 2020 release of Power BI Desktop a couple of new Power Query functions were added: Geography.FromWellKnownTextGeography.ToWellKnownTextGeographyPoint.FromGeometry.FromWellKnownTextGeometry.ToWellKnownText and GeometryPoint.From. These functions (which are coming soon to Power Query in Excel too), make it easier to work with geographic and geometric data in the Well Known Text format. You can have all kinds of fun with these functions if you have a visual (like the Icon Map custom visual) that can display Well Known Text data, but I’ll leave that kind of thing for future blog posts. In this post I’ll explain how the basics of how the functions actually work.

So far, it looks like it’s converting strings of latitude and longitude data (in the geography case) into individual elements for plotting, but no distance measures at this time.

Comments closed