Press "Enter" to skip to content

Month: February 2022

Types of Regression

The Finnstats folks talk about regression:

Basically, Regression analysis involves creating an equation to describe the significant association between one or more predictors and response variables, as well as estimating current observations.

The results of the regression reveal the direction, size, and analytical significance of the relationship between predictor and response, where the dependent variable is either numerical or discrete.

Click through for details on six types of regression. H/T R-Bloggers.

Comments closed

Multivariate Anomaly Detection in SynapseML

Louise Han has an announcement:

Today, we are excited to announce a wonderful collaborated feature between Multivariate Anomaly Detector and  SynapseML , which joined together to provide a solution for developers and customers to do multivariate anomaly detection in Synapse. This new capability allows you to detect anomalies quickly and easily in very large datasets and databases, perfectly lighting up scenarios like equipment predictive maintenance. For those who is not familiar with predictive maintenance, it is a technique that uses data analysis tools and techniques to detect anomalies in the operation and possible defects in equipment and processes so customers can fix them before they result in failure. Therefore, this new capability will benefit customers who have a huge number of sensor data within hundreds of pieces of equipment, to do equipment monitor, anomaly detection, and even root cause analysis.

Click through for more details and a demonstration on how to use it.

Comments closed

The Architecture of Project Bansai

Tsuyoshi Matsuzaki takes us through the architecture for Project Bansai:

Project Bonsai is a reinforcement learning framework for machine teaching in Microsoft Azure.

In generic reinforcement learning (RL), data scientists will combine tools and utilities (such like, Gym, RLlib, Ray, etc) which can be easily customized with familiar Python code and ML/AI frameworks, such as, TensorFlow or PyTorch.
But, in engineering tasks with machine teaching for autonomous systems or intelligent controls, data scientists will not always explore and tune attributes for AI. In successful practices, the professionals for operations or engineering (non-AI specialists) will tune attributes for some specific control systems (simulations) to train in machine teaching, and data scientists will assist in cases where the problem requires advanced solutions.

Read on to see how it works.

Comments closed

Deleting Individual Records from Azure Data Explorer

Slavik Neimer shows how to delete records from a table in Azure Data Explorer:

Azure Data Explorer is a big data analytics platform that takes care of everything required to ensure real time decision making can take place, or at least, near real time. This includes data ingestion, data querying, data visualization and data management.

In this blog post you’ll learn how to delete individual records from a table, and how it works behind the scenes.

Of particular note is the whatif=true clause, as it’d be nice to see what you burn before you burn it.

Comments closed

Plotting Multiple Columns on a Legend in Power BI

Jason Cockington has a workaround:

At a recent training course, one of the students asked if it was possible to add two different columns on the legend of a line chart, so that when a selection is made on a second slicer the chart splits to reveal multiple lines.

Given others in the class showed interest in the subsequent conversation, I decided to create a short blog so that everyone could benefit.

The short answer is “no” but the longer answer is more interesting.

Comments closed

Creating an Azure Integration Runtime

Andy Leonard builds out an Azure Integration Runtime:

Many Azure Data Factory developers recommend creating an Azure Integration Runtime for use with Mapping Data Flows. Why? One reason is you cannot configure all the options in the default AutoResolveIntegrationRuntime supplied when an Azure Data Factory instance is provisioned.

At the time of this writing, it’s not obvious how one creates an Azure Integration Runtime. You would think creating an integration runtime would begin with:

It turns out to be a little trickier than you might first expect.

Comments closed

Visualizing High-Density Regions with R

The rOpenSci team covers the history of the gghdr package:

This was how being a newcomer to rOpenSci OzUnconf 2019 felt. It was incredible to be a part of such a diverse, welcoming and inclusive environment. I thought it would be fun to blog about how it all began, and the twists and turns we experienced along the way as we developed the gghdr package. The package provides tools for plotting highest density regions with ggplot2 and was inspired by the package hdrcde developed by Rob J Hyndman. The highest density region approach of summarizing a distribution is useful for analyzing multimodal distributions and can be composed of numerous disjoint subsets. For example, the histogram of the highway mileage (hwy) data from the mpg dataset (a) shows that cars with 6 cylinders (cyl) are bimodally distributed, which is reflected in the highest density region (HDR) boxplot (c) but not in the standard boxplot (b). Hence, we see that HDRs are useful in displaying multimodality in the distribution.

Read on for a short history of an interesting package.

Comments closed

Filtered Indexes and Functions

Eitan Blumin looks at filtered indexes:

In fact, absolutely no functions of any kind can be used within the WHERE clause of a filtered index. Not even schema-bound user-defined scalar functions.

Unfortunately, as stated in the Microsoft Docs page about Filtered Indexes, the WHERE clause of a filtered index can only support simple comparison operators.

Well, it’s not entirely true, as you CAN actually use some functions, but on two conditions:

Read the whole thing. Eitan lays out one limitation of filtered indexes and provides a couple of potential workarounds.

Comments closed

Verbalizing a Chart

Alex Velez reminds us of the spoken side of communication:

I’m confident that I could overcome some of these design challenges by effectively explaining the graph to someone else. Will it be a perfect data communication? No—but sometimes, we have to deal with less-than-ideal circumstances like time limitations, or not having control over our designs. Knowing how to verbalize a graph can be a practical solution when faced with these constraints.

I should caveat this by clarifying that my intention is not to say that we shouldn’t spend time on our visualizations. But too often, we focus only on the visual. We believe that a graph or a picture is worth a thousand words. Or maybe we assume that because we created the chart, we will automatically know how to talk through it. I am super guilty of this!

Read on for some tips on vocalizing a visual.

Comments closed

Unicode and Data Length

Kevin Wilkie lays out an argument:

If you truly need the UNICODE characters in your data, go ahead and use them! If not though, please make your DBA happy by not using them. Since UNICODE characters take up twice the amount of space as the ASCII versions do, then your DBAs will recommend to use the ASCII versions if you are not going to be using any UNICODE characters.

Read on for the justification. But I’m still NVARCHAR (Almost) Everywhere.

Comments closed