Press "Enter" to skip to content

Month: October 2020

Kafka and Zookeeper: a Breakup in the Making

Gautam Goswami walks us through the situation with Apache Kafka and Apache Zookeeper:

Zookeeper is completely a separate system having its own configuration file syntax, management tools, and deployment patterns. In-depth skill with experience is necessary to manage and deploy two individual distributed systems and eventually up and running Kafka cluster. The person who manages both the system together should have enough troubleshooting information to find out issues in both the systems. 

There could be a possibility of making mistake on Zookeeper’s configuration files that might lead to breaking down of Kafka cluster. So having expertise in Kafka administration without Zookeeper won’t be able to help to come out from the crisis especially in the production environment where Zookeeper runs on a completely isolated environment (Cloud). Even though to setup and configure a single-node Kafka cluster for learning and R&D, we can’t proceed without Zookeeper.

Read on for the rest of the answer, as well as how Kafka is dis-integrating Zookeeper.

Comments closed

Records in C# 9

Patrick Smacchia walks us through record types in C# 9:

The second core property of string and record value-based semantic is immutability. Basically, an object is immutable if its state cannot change once the object has been created. Consequently, a class is immutable if it is declared in such way that all its instances are immutable.

I remember a discussion with a developer that got nervous about immutability. It looked like an unnatural constraint to him: he wanted his object’s state to change. But he didn’t realized that something he used everyday – string operations – relied on immutability. When you are modifying a string actually a new string object gets created. Records behave the same way. Moreover a clean new syntax based on the keyword with has been introduced with C#9. 

They aren’t as fancy as F# record types, but it is fun to watch C# move slowly to being a functional-friendlier language—something which has been the case since Don Syme helped implement generics in C#.

Comments closed

Optimizing Common Table Expressions

Itzik Ben-Gan continues a series on common table expressions:

If you’re wondering why not use a much simpler solution with a grouped query and a HAVING filter, it has to do with the density of the shipperid column. The Orders table has 1,000,000 orders, and the shipments of those orders were handled by five shippers, meaning that in average, each shipper handled 20% of the orders. The plan for a grouped query computing the maximum order date per shipper would scan all 1,000,000 rows, resulting in thousands of page reads. Indeed, if you highlight just the CTE’s inner query (we’ll call it Query 3) computing the maximum order date per shipper and check its execution plan, you will get the plan shown in Figure 3.

Read on for classic Itzik.

Comments closed

Durable Azure Functions and Azure Data Factory

Rayis Imayev wants to use Azure Functions with Azure Data Factory:

Ok, here is my problem: I have an Azure Data Factory (ADF) workflow that includes an Azure Function call to perform external operations and returns output result, which in return is used further down my ADF pipeline. My ADF workflow (1) depends on the output result of the Azure Function call; (2) plus a time efficiency of the Azure Function call is another factor to consider, if its time execution hits 230 seconds or more, ADF Azure Function will fail with a time-out error message and my workflow is screwed.

This gave Rayis the impetus to try out durable functions. Read on to see how that worked out.

Comments closed

Querying Multiple Data Sources in Azure Synapse Analytics

James Serra walks us through querying Data Lake Storage Gen2, Cosmos DB, and a table created in an Azure Synapse serverless Apache Spark pool:

As I was finishing up a demo script for my presentation at the SQL PASS Virtual Summit on 11/13 (details on my session here), I wanted to blog about part of the demo that shows a feature in the public preview of Synapse that is frankly, very cool. It is the ability to query data as it sits in ADLS Gen2, a Spark table, and Cosmos DB and join the data together with one T-SQL statement using SQL on-demand (also called SQL serverless), hence making it a federated query (also known as data virtualization). The beauty of this is you don’t have to first write ETL to collect all the data into a relational database in order to be able to query it all together, and don’t have to provision a SQL pool, saving costs. Further, you are using T-SQL to query all of those data sources so you are able to use a reporting tool like Power BI to see the results.

Click through to see how.

Comments closed

Custom Formatting in Powershell

Jeffrey Hicks takes us through formatting in Powershell and uses Get-Process as an example:

One of the features I truly enjoy about PowerShell, is the ability to have it present information that I need in a form that I want. Here’s a good example. Running Get-Process is simple enough and the output is pretty complete. But one thing that would make it better for me, is that sometimes I want an easy way to see high-memory use properties. Yes, I can pipe Get-Process to Sort-Object and Where-Object. However, in this particular situation, what I really want is to see high-memory usage processes displayed in red. Maybe those that are getting close to my arbitrary limit I’d like to see in Yellow. This isn’t that difficult to achieve using ANSI escape sequences.

Click through to see how.

Comments closed

Issues Using EF Core Database First to Reverse Engineer SQL Server Databases

Erik Ejlskov Jensen takes us through several things to watch out for when reverse engineering a SQL Server database in Entity Framework Core:

Issue

SQL Server allows blank column names in tables, but this causes the following error when scaffolding: The string argument 'originalIdentifier' cannot be empty.

Workarounds

– Use EF Core Power Tools, which contains a fix for this. (Fix will also be in EF Core 6.0)
– Rename the column 🙂

Click through for several more issues and solutions in this vein.

Comments closed

Spark Infer Schema vs ADF Get Metadata

Paul Andrew compares two techniques for retrieving metadata:

For file types that don’t contain there own metadata (CSV, Text etc) we typically have to go and figure out there structure including; attributes and data types before doing any actual transformation work. Often I’ve used the Data Factory Metadata Activity to do this with its structure option. However, while playing around with Azure Synapse Analytics, specifically creating Notebooks in C# to run against the Apache Spark compute pools I’ve discovered in most case the Data Frame infer schema option basically does a better job here.

Now, I’m sure some Spark people will probably read the above and think, well der, obviously Paul! Spark is better than Data Factory. And sure, I accept for this specific situation it certainly is. I’m simply calling that out as it might not be obvious to everyone

Read on for a comparison of the two techniques.

Comments closed

Azure Data Studio, October 2020 Edition

Alan Yu shows off this month’s changes in Azure Data Studio:

You can now deploy Azure SQL resources from the deployment wizard in Azure Data Studio. These new options sit alongside local options like SQL Server on-premises and on Big Data Clusters and hybrid options, like SQL Managed Instance on Azure Arc. The deployment wizard includes UI-assisted Notebook experiences to deploy Azure SQL virtual machines and links to the Azure portal to create SQL databases, database servers, and elastic pools (SQL managed instances are not yet included).

Click through for more information.

Comments closed