Press "Enter" to skip to content

Month: August 2022

Databricks Extension for VSCode at 1.0

Gerhard Brueckl shares the good news:

As you probably know from my previous posts, my colleagues at paiqo.com and I are constantly working to improve our VSCode extension for Databricks. Almost every month we silently release a new version to the VSCode gallery so you get the latest features. However, as this is a special release, I am also writing a dedicated blog post for it

There’s a lot of cool stuff in here, so check it out.

Comments closed

Z-Ordering with Apache Impala

Zoltan Borok-Nagy and Norbert Luksa show off a performance improvement in Apache Impala:

So we’ll have great search capabilities against the partition columns plus one data column (which drives the ordering in the data files). With our sample schema above, this means we could specify a SORT BY “platform” to enable fast analysis of all Android or iOS users. But what if we wanted to understand how well version 5.16 of our app is doing across platforms and countries?

Can we do more? It turns out that we can. There are exotic orderings out there that can also sort data by multiple columns. In this post, we will describe how Z-order allows ordering of multidimensional data (multiple columns) with the help of a space-filling curve. This ordering enables us to efficiently search against more columns. More on that later.

It looks like a really good technique for nearly-static data, sort of like you’d see with a data warehouse which refreshes once a day.

Comments closed

Apache eCharts for Python

Mark LItwintschik looks at another charting library:

The Apache eCharts project is a web-based charting library. It was started in 2013 and built using 77.5K lines of TypeScript. It is well documented and has over 200 examples of its API’s usage. The examples allow you to toggle between light/dark mode and there is a cheat sheet and a theme builder with several tasteful presents to choose from.

This is a library I hadn’t heard of before but Mark shows it off a bit.

Comments closed

PSWorkItems: Powershell TODOs

Jeffrey Hicks has a new module:

I spend my working days living in a PowerShell console. Over the years, I’ve developed many PowerShell modules to help me manage the chaos that is my work life. One area that always demands attention is managing my tasks and To-Dos. For several years I have been using the MyTasks module. This module stored tasks and supporting information in a set of XML files. The code in the module treated the XML files as databases. I was trying to avoid a dependency on a SQL Server Express installation with the idea that would be overkill.

ln the meantime, I finally got around to finishing and publishing the MySQLite PowerShell module. This module has a set of PowerShell functions designed to simplify working with SQLite database files. This type of database has a much smaller footprint than SQL Server Express and would streamline my task management.

Click through to see how it works.

Comments closed

Why Transaction Log Backup Chains Break

Tom Collins enumerates several reasons for a transaction log backup chain breaking:

The SQL Server transaction log backup chain aka log chain is the series of sequential transaction log backups related to a database. The log backups are related to each other and are represented through LSN . Breaking the transaction log chain will limit the restore point of the backups. 

Click through for four such reasons as well as a scenario explaining how it could happen.

Comments closed

Preventing Data Exfiltration form Managed Instances

Niko Neugebauer wants to hang on to that data:

Data exfiltration is a technique that is also sometimes described as data theft or data extrusion, that describes the unauthorized extraction of data from the original source. This unauthorized extraction can be executed either manually or automatically by the malicious attacker.

As part of your Network Infrastructure, you might have tightened your security to make sure you have all the bells and whistles to lock down your Azure SQL Managed Instance to be accessed only by your application and not exposed to the Internet or any other traffic. However, this doesn’t stop a malicious admin from taking a backup or creating a linked server to another resource outside your enterprise subscription for extracting the data. This action would be data exfiltration. In a typical on-premises infrastructure, you can lock down network access completely to make sure that the data never leaves your network. However, in a cloud setup, there is a possibility that someone with elevated privileges can export data or perform some other malicious activity targeting their own resources outside your organization, compromising your enterprise data. Hence, it is very important to understand the different data exfiltration scenarios and make sure that you are taking the right steps to monitor for and prevent such activities.

Click through for a table which shows common exfiltration scenarios and things you can do to reduce the risk of exfiltration. With access, though, there’s always going to be a risk of exfiltration: even in a SCIF, you can get away with shoving records into your pants if you’re famous enough.

Comments closed

KQL Parse

Robert Cain continues a series on KQL:

The previous post in this series Fun With KQL – Extract, showed how we can use the extract operator to pull part of a string using regular expressions. I think you’d agree though, using regular expressions can be a bit tricky.

If you have a string that is well formatted with recurring text you can count on, and want to pull one or more strings from it into their own columns, Kusto provides a much easier to use operator: parse.

Robert includes a series of examples, including examples of things you cannot do.

Comments closed

Data Mesh at Netflix

Bo Lei, et al, describe their Data Mesh architecture:

Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.

Click through for a high-level overview of the architecture.

Comments closed

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….

Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

Monitoring Log Shipping with T-SQL

Lori Brown tracks log shipping operations:

For Log Shipping, some information is only available on the primary or only on the secondary.  That means that I had to set up a linked server on the primary to connect to the secondary.  I do not want to create any OPENROWSET queries for this since that would require that AdHoc Distributed Queries be enabled.  I am not a fan of opening that up for the following reasons:

1) It can allow buffer overflow bugs to compromise systems.

2) It can allow a compromised server to connect to a non-compromised server. 

Read on to see what Lori prefers instead.

Comments closed