2021-03-19 – Curated SQL

Kafka Replication with MIrrorMaker

Published 2021-03-19 by Kevin Feasel

In this new two part blog series we’ll turn our gaze to the newest version of MirrorMaker 2 (MM2), the Apache Kafka cross-cluster mirroring, or replication, technology. MirrorMaker 2 is built on top of the Kafka Connect framework for increased reliability and scalability, and is suitable for more demanding geo-replication use cases including migration, backup, disaster recovery and fail-over. In part one we’ll focus on MirrorMaker 2 theory (Kafka replication, architecture, components and terminology) and invent some MirrorMaker 2 rules. Part two will be more practical, and we’ll try out Instaclustr’s managed MirrorMaker 2 service and test the rules out with some experiments.

Go check out part 1.

Comments closed

Spark on Windows Subsystem for Linux 2

Published 2021-03-19 by Kevin Feasel

Gavin Campbell tries out Spark on Linux on Windows:

I’m not a frequent user of Windows, but I understand getting dependencies installed for local development can sometimes be a bit of a pain. I’m using an Azure VM, but these instructions should work on a regular Windows 10 installation. Since I’m not a “Windows Insider”, I followed the manual steps here to get WSL installed, then upgrade to WSL2. The steps are reproduced here for convenience:

Click through for the installation steps and the process.

Comments closed

Loading Azure Synapse Analytics using PolyBase

Published 2021-03-19 by Kevin Feasel

Gauri Mahajan needs to load some data:

Azure Synapse Analytics is Microsoft’s data warehousing offering on Azure Cloud. It supports three types of runtimes – SQL Serverless Pool, SQL Dedicated Pool, and Spark Pools. As there are a variety of data sources on Azure, it’s very obvious that there can be varying types and volumes of data that would have to be loaded into Azure Synapse pools. There are three major types of data ingestion approaches that can be used to load data into Synapse. The COPY command is the most flexible and elaborate mechanism, where someone can execute this command from a SQL pool to load data from supported data repositories. This command is convenient to load ad-hoc and small to medium-sized data loads into Synapse. The second method of loading data is the Bulk Insert, where the method name is self-relevant regarding the approach functionality. To ingest the data from supported repositories into dedicated SQL pools, PolyBase is as efficient and at times it’s even more efficient than the COPY command. This article will help you understand the process to ingest data into Azure Synapse Analytics using PolyBase to load the data.

Click through for the process.

Comments closed

Calculating Pagination Metadata in SQL

Published 2021-03-19 by Kevin Feasel

Lukas Eder has a single query which includes pagination data:

This is rather straightforward. It will give us page 2 out of N pages, with a page size of 10. But how do we calculate these values? How do we know we’re on page 2? How do we know the number of pages N? Can we calculate this without an extra round-trip e.g. to calculate the total number of actors:
-- Yuck, a second round-trip!
SELECT COUNT(*)
FROM actor
We can do it with a single SQL query and window functions, but before I explain how to do this, please consider reading this article on why OFFSET pagination is a bad thing for your performance
If you’re still convinced OFFSET pagination is what you need, as opposed to keyset pagination, let’s look at how to calculate the above meta data with SQL.

Click through for the query, as well as Lukas’s explanation of how it works. But also heed that warning about keyset pagination, as it’s usually a lot better.

Comments closed

Logging Powershell Script Details

Published 2021-03-19 by Kevin Feasel

Patrick Gruenauer has logging boilerplate code for us:

So you have already created your first PowrShell scripts? Now you want to enhance this scripts with error logging ? If your answer is yes, jump in this this article. I will show you how to implement a custom function that captures the errors and writes errors in an error log file. Let’s get started.

Click through for the code, as well as an explanation of each bit.

Comments closed

Continuous Backup with Cosmos DB

Published 2021-03-19 by Kevin Feasel

Hasan Savran reviews a new bit of functionality in Cosmos DB:

Azure Cosmos DB announced Continuous Backup in Cosmos DB on March 2021. This feature is currently in public preview mode and It is not recommended to use in production. This option gives you more options for your backup requirements. You might be using Azure Data Factory to handle your custom backup needs. Azure Data Factory is the SSIS in cloud. ETL jobs can be problematic. Backing up a database is half of the problem; other half is restoring a database. Until now, we had to call Microsoft to restore Cosmos DB databases/accounts.
By using Continuous backup, you can easily backup and restore your database. For now, this option is available only for SQL API and Mongo API. There are many limitations in this public preview version. I am sure many of these limitations will go away when it becomes generally available to everybody.

Click through for more details about the offering, as well as how to enable it. We’ll have to wait until it’s out of public preview to see how much it will cost, but it does look interesting.

Comments closed

Viewing SQL Server Logs on Linux

Published 2021-03-19 by Kevin Feasel

Jack Vamvas answers a question:

Question: How can I view SQL Server Logging on Linux ?
Answer: SQL Server on Windows – logs details into SQL Server Error logs and Application Logs. Windows event viewer details are available in the Windows Event Logs via the event viewer or Powershell Get-EventLog

The Windows answer is pretty easy for SQL Server DBAs, as we’ve lived in it for so long. Click through for the Linux answer.

Comments closed

Load Testing using SqlQueryStress

Published 2021-03-19 by Kevin Feasel

Chad Callihan walks us through the SqlQueryStress program:

Do you have a new SQL server that you need to load test against? What about a new stored procedure that needs tested with various parameters? Maybe you’re just trying to punish your CPU? Whatever the reason, my favorite tool for these scenarios is Adam Machanic’s SqlQueryStress. Before we run through some examples, check out SqlQueryStress on GitHub or get SqlQueryStress from the Microsoft Store.

It’s a pretty simple program which I’ve used for well over a decade. Chad does a good job of walking us through the tool.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Day: March 19, 2021