Author: Kevin Feasel

Need?
Basically, in order to prepare the message for transmission from the producer to the broker, we use serializers. In other words, before transmitting the entire message to the broker, let the producer know how to convert the message into a byte array we use serializers. Similarly, to convert the byte array back to the object we use the deserializers by the consumer.

Click through for an example.

Comments closed

The Roadmap for Zookeeper-less Kafka

Published 2020-05-19 by Kevin Feasel

Colin McCabe explains the mechanics behind KIP-500:

So what is the problem with ZooKeeper? Actually, the problem is not with ZooKeeper itself but with the concept of external metadata management.
Having two systems leads to a lot of duplication. Kafka, after all, is a replicated distributed log with a pub/sub API on top. ZooKeeper is a replicated distributed log with a filesystem API on top. Each has its own way of doing network communication, security, monitoring, and configuration. Having two systems roughly doubles the total complexity of the result for the operator. This leads to an unnecessarily steep learning curve and increases the risk of some misconfiguration causing a security breach.
Storing metadata externally is not very efficient. We run at least three additional Java processes, and sometimes more. In fact, we often see Kafka clusters with just as many ZooKeeper nodes as Kafka nodes! Additionally, the data in ZooKeeper also needs to be reflected on the Kafka controller, which leads to double caching.

Read on to see how they’re looking to cut out Zookeeper dependencies. It’s an interesting story of post hoc dependency removal.

Comments closed

Monitoring Power BI

Published 2020-05-19 by Kevin Feasel

James Serra gives us a few options to monitor different aspects of Power BI:

Performance analyzer: Find out how each of your report elements, such as visuals and DAX formulas, are performing. Using the Performance Analyzer, you can see and record logs that measure how each of your report elements performs when users interact with them, and which aspects of their performance are most (or least) resource intensive. This is accomplished by clicking a “Start recording” button and interacting with the elements you want to test. More info at Use Performance Analyzer to examine report element performance.

Click through for several other tools, which you can combine to get a better feel for how your environment is doing.

Comments closed

Order of Operations and the Unary Negative Operator

Published 2020-05-19 by Kevin Feasel

Kenneth Fisher takes us through a weird problem:

Last but certainly not least our problem child.
SELECT -100.0/-100.0*10.0
In the absence of parenthesis and nothing but multiplication and division it should go left to right. Right? But that would give us -100.0/-100.0 returning 1.0 and finally 1.0*10.0 giving us 10.0. But that’s not what we are getting. Somehow we are ending up with 0.1.

Read the comments, where Joe Celko explains the reasoning behind this strange behavior.

I tried this out on dbfiddle and here were some quick results:

Postgres 12 — 10.0
Oracle 18c (had to query from dual) — 10
MySQL 8.0 — 10
SQLite 3.27 — 10
SQL Server — 0.1

So it’s definitely not the case everywhere. The thing I noticed for SQL Server is that they don’t call out unary operators in the operator precedence guide, as opposed to, say, Oracle.

Comments closed

Generating Scripts to a Notebook with SSMS

Published 2020-05-19 by Kevin Feasel

Taiob Ali tries out a new feature in SQL Server Management Studio:

SQL Server Management Studio (SSMS) was released on April 7th, 2020. You can download this latest version from this link. 18.5 is an update to 18.4 with these new items and bug fixes.
One of the features added in this release is to select ‘Azure Data Studio‘ Notebook as a destination for Generate Scripts wizard.
Now you can send the objects definition of Table, View, Stored Procedure, Function along with sample call, sample data, and my comments all packaged in one Azure Data Studio Notebook. I can see scope for better communication between business partners, developers, and database engineers.

Click through for an example of the process.

Comments closed

Rows Read Versus Rows Returned

Published 2020-05-19 by Kevin Feasel

Hugo Kornelis explains another issue with arrow widths in tools like SQL Server Management Studio:

The visual of the arrows in an execution plan strongly suggests that they represent the flow of rows from one operator to another. And hence, the width of that arrow strongly appears to be an indication of how many rows are passed between the connected operators. That used to be always the case. But unfortunately, this changed in December 2017 (or earlier, but that was when I first noticed the change).
The root cause is the addition of a new property, returned by Scan and Seek operators: Number of Rows Read (as well as its estimated counterpart: Estimated Number of Rows to be Read). Now don’t get me wrong, those properties are awesome and I’m super happy that they were added to execution plans. When a filter condition is pushed into a scan or seek operator as a Predicate property, I want to be able to see how effective it is, and comparing Actual Number of Rows to Number of Rows Read (or Estimated Number of Rows to Estimated Number of Rows to be Read) helps me assess just that. Which in turn can help me decide whether I should change my indexing, or try to rewrite the query. But I digress.

Click through to learn what the issue is. Hugo describes a tricky situation where there are two valuable measures but only one way to show them. If you agree with Hugo’s preferences, here’s a Feedback item for you.

Comments closed

Helping Users in Powershell Scripts

Published 2020-05-19 by Kevin Feasel

Greg Moore walks us through Powershell’s ability to display help info:

Shortly after writing my last article on Parameters, I had to update a script, and I wanted to make it easier for others to run. One of the features I wanted to add was the ability to show them what the script would do with the provided parameters without actually running the script, in other words, provide “help”.

There is a thoughtful way to do this, and Greg walks us through it, while also showing us a few false starts along the way.

Comments closed

Getting Row Counts After Execution

Published 2020-05-19 by Kevin Feasel

Grant Fritchey gives us a few ways to find row counts after a query has finished:

But, if you really want to get picky, batches and procedures frequently have multiple statements. This means that the row count you’re getting isn’t for a particular table. If you want to get that, then you need to capture the statement level stuff, sp_statement_completed or sql_statement_completed, depending on if you’re looking at procedures or batches.
However, this is one of those “be careful what you wish for” moments. Because, if you capture all statements in a system, you’re going to have a very large amount of data to deal with. This is where filtering is your friend.

Click through for Grant’s full answer.

Comments closed

ksqlDB 0.9.0

Published 2020-05-18 by Kevin Feasel

Sergo Pena walks us through what’s new in ksqlDB 0.9.0:

We’re pleased to announce the release of ksqlDB 0.9.0! This version includes support for multi-join statements, enhanced LIKE expressions, and a host of usability improvements. We’ll go through a few of the key changes, but you can see the changelog for a detailed list of all fixes and improvements.

Read on for a high-level overview.

Comments closed

Text Mining and Sentiment Analysis in R

Published 2020-05-18 by Kevin Feasel

Sanil Mhatre walks us through a sentiment analysis scenario in R:

Sentiments can be classified as positive, neutral or negative. They can also be represented on a numeric scale, to better express the degree of positive or negative strength of the sentiment contained in a body of text.
This example uses the Syuzhet package for generating sentiment scores, which has four sentiment dictionaries and offers a method for accessing the sentiment extraction tool developed in the NLP group at Stanford. The get_sentiment function accepts two arguments: a character vector (of sentences or words) and a method. The selected method determines which of the four available sentiment extraction methods will be used. The four methods are syuzhet (this is the default), bing, afinn and nrc. Each method uses a different scale and hence returns slightly different results. Please note the outcome of nrc method is more than just a numeric score, requires additional interpretations and is out of scope for this article. The descriptions of the get_sentiment function has been sourced from : https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html?

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31