What To Know Before Integrating With Apache Kafka

Kevin Feasel

2019-01-28

Hadoop

Adi Polak gives us seven helpful tips to think about before building a Kafka cluster:

2 — You shouldn’t send large messages or payloads through Kafka
According to Apache Kafka, for better throughput, the max message size should be 10KB. If the messages are larger than this, it is better to check the alternatives or find a way to chop the message into smaller parts before writing to Kafka. Best practice to do so is using a message key to make sure all chopped messages will be written to the same partition.

Read the whole thing.

Spark And Splitting DataFrames

Giovanni Lanzani explains that one technique to split a data frame doesn’t quite work as expected:

Recently I was delivering a Spark course. One of the exercises asked the students to split a Spark DataFrame in two, non-overlapping, parts.

One of the students came up with a creative way to do so.

He started by adding a monotonically increasing ID column to the DataFrame. Spark has a built-in function for this, monotonically_increasing_id — you can find how to use it in the docs.

Read on to see how this didn’t quite work right, why it didn’t work as expected, and one alternative.

ASCII Block Text In Powershell

Roman Gelman shows how to use the Write-Ascii cmdlet to build ASCII art-style text in Powershell:

The ​​​Write-Ascii function from my PowerShell MS-Module module can draw custom text in the PowerShell console using ASCII fonts. This article’s cover was drawn by the Write-Ascii function too. The below code does the same, it writes end of line (EOL) character or emulates the Enter key.

Read on for examples and enjoy.

Generating Reports In Word Via Flow

Chris Webb has an example of taking a data set and generating a report in a Word document:

The idea is to loop through the rows in the Excel table and use the data on each row to populate the content controls in the template and then create a new Word document. Here’s a Flow that does this:

The steps are a bit convoluted, but they work. Chris mentions at the end why people might want to do this, and I’ll reiterate that: I’ve been in several discussions over the years where people want to embed data inside a document without manual intervention, and using tools like Reporting Services, that has not been pretty.

Powershell Core On Ubuntu Using Docker

Max Trinidad has an Ubuntu VM running Powershell Core on Docker:

While learning about Docker Container, I notice that is much easier to installed on a Linux system. In Windows, Hyper-V is a requirement to install Docker, and specially if you want to use the “Windows Subsystem in Linux” WSL feature, there’s more setup to complete. So, I’m not using Hyper-V, I’m using VMware Workstation.  To keeping simple, I created an Ubuntu 18.04 VM using VMWare Workstation.

You can find the Docker CE installation instructions in the following link.

If you’re using Ubuntu 18.04. make sure to install Curl, as it isn’t included in the OS.

Click through for instructions on how to set this up and join the three layers club (which is not quite the three commas club but close).

Emulating PERCENTILE_DISC

Kevin Feasel

2019-01-28

Syntax

Lukas Eder points out that there are two ways to use PERCENTILE_DISC() and most database platforms support only one:

Oracle has the most sophisticated implementation, which supports both the ordered set aggregate function, and the window function version:
– Aggregate function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x)
– Window function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x) OVER (PARTITION BY y)

But there are ways to calculate PERCENTILE_DISC() using a couple of window functions, so read the whole thing.

Platform Compatibility and SSDT

Ed Elliott walks us through platform compatibility in SQL Server Data Tools:

Sometimes you don’t have the perfect development environment for SQL Server, sometimes you deploy to things like SQL Azure and want to test locally, for various reasons it is possible that you want to deploy to one version of SQL Server but set the project properties to a different version of SQL Server. If you find yourself in this situation you might need to use the parameter AllowIncompatiblePlatform
 which essentially says “I know I said the project is for SQL 2017 but I am deploying to SQL 2014”, anytime you say this you also sign the contract that says “If I am deploying to a lower version then I have made sure I haven’t used any keywords or object types that didn’t exist in that version and also if everything fails then so be it, I live with my choices every day”.

The story is a little complicated, but Ed straightens it out for us.

SESSION_CONTEXT Is Case-Sensitive

Kevin Feasel

2019-01-28

T-SQL

Steve Jones notes an issue people may have when trying out the SESSION_CONTEXT() function in SQL Server:

Notice a difference? In the first query, I have SupplierID, but the second is SupplierId, with a lower case “d”. These keys are determined when you use sp_set_session_context, which takes a sysname value for the key. These are going to be case sensitive, as each one is a different identifier.

Click through for the full story. I’m not a fan of case-sensitivity in general and especially not in a system where, by default, most things are case-insensitive.

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031