Press "Enter" to skip to content

Curated SQL Posts

Multivariate Anomaly Detection with ADX

Adi Eldar shows off multivariate anomaly detection in Azure Data Explorer:

Azure Data Explorer (ADX) is commonly used for monitoring cloud resources and IoT devices performance and health. This is done by continuous collection of multiple metrics emitted by these sources, and on-going analysis of the collected data to detect anomalies. The analysis is applied over time series of the relevant metrics in order to locate significant deviations of the metrics values relative to their typical normal baseline pattern.

Click through for a nice overview of the topic, including two different scenarios: one which emphasizes time series data and the other, which does not.

Comments closed

Bit Twiddling in T-SQL

Louis Davidson explains how bit operations work in T-SQL:

I expect that 99% of the people reading this looks at this probably would expect there to be a status table that contained the values of status. Seeing that this is a base 2 number, you may be in that 1% that thinks this might be a bitmask. but unless you have and eidetic memory, you probably don’t know what all of the bits mean.

A bitmask is a type of denormalization of values where instead of having a set of columns that have on or off values (no Null values), you encode it like:

Bitmasks make me break out the angry nun ruler. You can almost guarantee you’re doing something wrong if you design a bitmask as a column in a table.

Comments closed

Enterprise-Level Backups in MySQL

Lukas Vileikis continues a series on backups with MySQL:

MySQL Enterprise Backup is a known tool for everyone inside of the world of MySQL, no matter if people dealing with the tool are enterprise (business) minded or not. The tool can be considered the flagship of MySQL’s enterprise-level offerings: it comes as a part of its Enterprise tier and costs thousands to attain: is it worth your money, and perhaps more importantly, your time? We will figure that out in this blog.

Read on for Lukas’s thoughts.

Comments closed

Azure SQL Managed Instance Performance

Reitse Eskens wraps up a series on Azure SQL performance comparisons:

So far, the blogs were about the really SaaS databases; the database is deployed and you don’t have think about it anymore. This ease of use comes at a ‘price’. You’ve got no control whatsoever on files, you’ve lost the SQL Agent and a number of other features. The managed instance is a bit different. When I was testing you could see the TempDB files but not change them, since then a few changes have been made to this tier where you’re able to change settings and, Niko Neugebauer told the data community on twitter, there are more changes coming. With the managed instance, you get the agent back and you can run cross database query’s again. So you can safely say the managed instance is a hybrid between your trusty on-premises server and the fully managed Azure SQL database.

Click through for Reitse’s thoughts.

Comments closed

Defining an Analytics Engineer

Ust Oldfield defines a term:

Analytics Engineering, along with Data Engineering and Report Engineering, is a specialised subset of skills that would previously be the preserve of a Business Intelligence (BI) Developer. The BI Developer was once a generalist data developer, whose overall responsibilities have been split out and shared among specialist developers as the prevalence of data across organisation has increased and the tools and technologies used to ingest, transform, and serve data have become more specialised and loosely integrated.

In the same way that Data Engineering borrowed and took inspiration from Software Engineering for applying repeatable and scalable patterns and techniques to the pipelines that ingest and cleanse data, as well as the rigorous testing of those pipelines, Analytics Engineering has borrowed and taken inspiration from Software Engineering too.

Click through for the specifics of what an Analytics Engineer does.

Comments closed

An Intro to Azure Machine Learning

Tomaz Kastrun has a new Advent challenge:

Azure Machine Learning (or Azure Machine Learning Service and abbreviation AML) is Azure’s cloud service for creating, managing and productionalising machine learning projects. It is a collaborative tool for Data Scientists, Machine Learning Engineers, and data engineers, covering their daily and operational tasks. From creating and training to deploying and managing predictive models and machine learning solutions.

Click through for the introduction.

Comments closed

Data Lake Exploration in AWS with Athena for Spark

Pathik Shah and Raj Devnath jetski the data lake:

Amazon Athena now enables data analysts and data engineers to enjoy the easy-to-use, interactive, serverless experience of Athena with Apache Spark in addition to SQL. You can now use the expressive power of Python and build interactive Apache Spark applications using a simplified notebook experience on the Athena console or through Athena APIs. For interactive Spark applications, you can spend less time waiting and be more productive because Athena instantly starts running applications in less than a second. And because Athena is serverless and fully managed, analysts can run their workloads without worrying about the underlying infrastructure.

Data lakes are a common mechanism to store and analyze data because they allow companies to manage multiple data types from a wide variety of sources, and store this data, structured and unstructured, in a centralized repository. Apache Spark is a popular open-source, distributed processing system optimized for fast analytics workloads against data of any size. It’s often used to explore data lakes to derive insights. For performing interactive data explorations on the data lake, you can now use the instant-on, interactive, and fully managed Apache Spark engine in Athena. It enables you to be more productive and get started quickly, spending almost no time setting up infrastructure and Spark configurations.

In this post, we show how you can use Athena for Apache Spark to explore and derive insights from your data lake hosted on Amazon Simple Storage Service (Amazon S3).

This feels a lot like the Spark pool in Azure Synapse Analytics, as well as some of what Databricks does

Comments closed

Apache Ranger on ElasticMapReduce

Laurence Geng looks at Ranger:

Whether you’ve successfully made it before or not, installing and integrating Windows AD/OpenLDAP + Ranger + EMR is a very hard job, it is complicated, error-prone, and time-consuming for the following reasons:

Read on for the list of reasons, some background on Ranger, and an automated installer intended to make life a bit easier.

Comments closed

Pre-Attentive Attributes and Visualization

Alex Velez hits on an important topic:

Have you ever wondered whether the graph or the slide you created is any good? Was the time you spent choosing colors, deleting gridlines, and wordsmithing slide titles, worth it, or for naught? While the answer is certainly more nuanced than a simple yes or no, there is a quick way to gain some insight into this: the where are your eyes drawn? test, also known as WAYED. It’s a simple question, but it can help to refine your own creations, and provides a construct for giving feedback to others.

Click through for the test and a bit more information about pre-attentive attributes.

Comments closed