Kevin Feasel – Page 955

After exploring use cases for large message payloads, let’s clarify what Kafka is not:
Kafka is usually not the right technology to store and process large files (images, videos, proprietary files, etc.) as a whole. Products were built specifically for these use cases.
For instance, a Content Delivery Network (CDN) such as Akamai, Limelight Networks, or Amazon CloudFront distribute video streams and other software downloads across the globe. Or “big file editing and processing” (like a video processing tool). Or video editing tools from Adobe, Autodesk, Camtasia, and many other vendors are used to structure and present all video information, including films and television shows, video advertisements, and video essays.

There’s a lot of good advice in here. I think the best advice is essentially “don’t do this unless you need it” but I appreciate that Kai goes a lot further than that.

Comments closed

Speeding Up sp_helpdb

Published 2020-08-18 by Kevin Feasel

Dave Bland speeds up sp_helpdb:

When I run this on my computer, it usually takes between 500 and 1000 MS. More on this later.
Now let’s take a look what is happening behind the scenes with sp_helpdb. The first step is to populate a temporary table, #spdbdesc with database name, owner, when it was created and compatibility level. The code for the first step is below.

Watch Dave speed this up a bit.

Comments closed

Using Jupyter as an External Tool for Power BI Desktop

Published 2020-08-18 by Kevin Feasel

David Eldersveld continues a series on Power BI external tools:

Many people use Python with notebooks, so let’s take a look at one possible way to enable a Jupyter external tool for Power BI Desktop. The following stepwise approach begins with simply opening Jupyter. It then progresses to creating and opening a notebook that includes Power BI’s server and database arguments. Finally, it works its way toward downloading a notebook definition contained in a GitHub gist and connects to Power BI’s tabular model to start to make this approach more useful.
This post continues a series of posts related to Python and Power BI. The first three parts of this blog series introduced some possible uses for Python connected to a Power BI model, how to setup a basic Python external tool, and how to both use it with a virtual environment and connect to the Tabular Object Model.

This was a cool usage of Power BI’s external tool functionality and starts to give you an idea of how powerful it can be.

Comments closed

The Logic of Common Table Expressions

Published 2020-08-18 by Kevin Feasel

Itzik Ben-Gan dives into common table expressions:

The source of the term aside, common table expression, or CTE, is the commonly used term by T-SQL practitioners for the structure that is the focus of this article. So first, let’s address whether it is an appropriate term. We already concluded that the term table expression is appropriate for an expression that conceptually returns a table. Derived tables, CTEs, views and inline table valued functions are all types of named table expressions that T-SQL supports. So, the table expression part of common table expression certainly seems appropriate. As for the common part of the term, it probably has to do with one of the design advantages of CTEs over derived tables. Remember that you cannot reuse the derived table name (or more accurately the range variable name) more than once in the outer query. Conversely, the CTE name can be used multiple times in the outer query. In other words, the CTE name is common to the outer query. Of course, I’ll demonstrate this design aspect in this article.
CTEs give you similar benefits to derived tables, including enabling the development of modular solutions, reusing column aliases, indirectly interacting with window functions in clauses that don’t normally allow them, supporting modifications that indirectly rely on TOP or OFFSET FETCH with order specification, and others. But there are certain design advantages compared to derived tables, which I’ll cover in detail after I provide the syntax for the structure.

Click through for a lot of great detail. On the question of derived tables versus common table expressions, my mental taxonomy is basically APPLY > CTE > derived table, but that’s in a context-free discussion. In practice, all three are useful and the best question to answer is “Which thing helps future developers understand best my intent?”

Comments closed

Configuring Perfmon for Ongoing Data Collection

Published 2020-08-18 by Kevin Feasel

David Klee killed the radio star:

I just released a new training video (one of many to be released in the upcoming months) showing you how to set up Windows Perfmon for ongoing 24×7 collection. Recording ongoing performance information is vital to have a running history of the system state in case issues arise. More importantly, having a running stream of performance information gives you a running history that you can refer to for generating an ongoing baseline of your system’s resource consumption. Come learn how to set up Perfmon for ongoing collection!

Click through for the video and check out the Heraflux channel.

Comments closed

Stateful Stremaing with Spark

Published 2020-08-17 by Kevin Feasel

Sarfaraz Hussain continues a series on Spark Streaming:

Structured Streaming does processing under the hood as micro-batches (default nature).
A state is versioned between micro-batches while the streaming query runs. So as the series of incremental execution plans are generated (discussed in Part 2), each execution plan knows what version of the state it needs to read from.
Each micro-batch reads the previous version of the state data i.e. the previous running count then updates it and creates a new version. Each of these versions gets check-pointed into the same check-point location that we have provided in the query.

Read on to understand the implications of this and what it allows you to do.

Comments closed

Multi-Threaded Message Consuption with Kafka

Published 2020-08-17 by Kevin Feasel

Igor Buzatovic takes us through a fairly advanced topic in Apache Kafka:

If you are familiar with basic Kafka concepts, you know that you can parallelize message consumption by simply adding more consumers in the same group. However, that approach is more suitable for horizontal scaling where you add new consumers by adding new application nodes (containers, VMs, and even bare metal instances).
A multi-consumer approach can also be used for vertical scaling, but this requires additional management of consumer instances and accompanying consuming threads in the application code. Using multiple consumer instances introduces additional network traffic as well as more work for the consumer group coordinator since it has to manage more consumers.
While these concerns may not be strong enough reasons for switching from a thread per consumer to a multi-threaded model, there are use cases in which a multi-threaded model has compelling advantages.

Read the whole thing.

Comments closed

Killing Idle Analysis Services Sessions

Published 2020-08-17 by Kevin Feasel

Shabnam Watson shows us how to kill idle SQL Server Analysis Services sessions:

Think of this method as an emergency procedure only. As always, have database backups and try this on a development server first. Always take a backup of msmdsrv.ini before you modify any server properties. The default location of the file is this: C:\Program Files\Microsoft SQL Server\MSAS15.MSSQLSERVER\OLAP\Config
If you set the timeout values too low on server that is under resource pressure, you may not be able to get to the server properties using SSMS and change them quickly within the time you set for the timeout. For this reason, I prefer the user of XMLA in this case which makes the process faster.

Read on to see how to do this.

Comments closed

Getting Started with Jupyter Notebooks

Published 2020-08-17 by Kevin Feasel

Aveek Das takes us through the most popular name in notebooks:

In this article, I am going to explain what Jupyter Notebooks are and how to install the same on your machine. Further, I will demonstrate how to use these notebooks using Visual Studio Code and perform data analysis and other development activities. It is an open-source platform using which you can create and share documents that contain live code, equations, and visualizations along with the formatted text. Most importantly, these notebooks can be run on the web browser by just starting a server and using it. This open-source project is maintained by the team at Project Jupyter.

This is a fairly basic introduction to the topic, good if you have heard about notebooks but don’t know where to begin.

Comments closed

Working with Central Management Servers in dbatools

Published 2020-08-17 by Kevin Feasel

Mikey Bronowski continues a series on dbatools:

The built-in feature of the SSMS allows us to configure a group of SQL instances and run queries against multiple instances at once. With the registered servers you can also build a list of SQL Servers in one place, so everyone with access to the CMS can see them. First, we will start by creating registered servers and server groups.

This is an underrated set of functionality for SQL Server and dbatools does a good job working with it.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Author: Kevin Feasel

Processing Larger Messages with Apache Kafka

Speeding Up sp_helpdb

Using Jupyter as an External Tool for Power BI Desktop

The Logic of Common Table Expressions

Configuring Perfmon for Ongoing Data Collection

Stateful Stremaing with Spark

Multi-Threaded Message Consuption with Kafka

Killing Idle Analysis Services Sessions

Getting Started with Jupyter Notebooks

Working with Central Management Servers in dbatools