Press "Enter" to skip to content

Author: Kevin Feasel

Calibrating and Plotting a Time Series with healthyR.ts

Steven Sanderson builds a plot:

In time series analysis, it is common to split the data into training and testing sets to evaluate the accuracy of a model. However, it is important to ensure that the model is calibrated on the training set before evaluating its performance on the testing set. The {healthyR.ts} library provides a function called calibrate_and_plot() that simplifies this process.

Click through for the function’s input parameters and an example of how to use it.

Comments closed

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed

Aggregations and Distinct Counts in Power BI

Phil Seamark doesn’t have time to wait for Power BI to count:

This article aims to show how you can speed up distinct count calculations in Power BI using the built-in user-defined aggregations feature. The user-defined aggregation feature in Power BI is designed to work with direct query models and usually gets used for calculations such as SUM, MIN, MAX etc. However, it can also work well for distinct count calculations using the pattern shown in this article.

It’s an interesting partial aggregation approach which works really well when the distinct count is a small percentage of the total.

Comments closed

Kafka Control and Data Planes

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Building an Event Calendar from T-SQL and CSS

Aaron Bertrand builds some HTML:

In my last tip, I showed how to use T-SQL to generate HTML for fancy calendar visuals overlaid with event data from another table. As an extension of that tip, let’s now look at simplifying parts of that query by caching the date information in a calendar table to streamline the outer queries and avoid complications caused by different DATEFIRST settings.

This is the follow-on from a prior post (linked in the lede here), so it would make sense to read that one first if you haven’t already.

Comments closed

Principles of Synapse Security

Liliam Leme provides an overview of security options in Azure Synapse Analytics:

This blog post will provide an overview of the Synapse security environment focused on Dedicated SQL Pool, Serverless SQL Pool, and Spark.

Security has many layers and frequently it will determine how you build your process. I start this post by reviewing several important security considerations which you can later apply to your Synapse environment. 

This is a fairly lengthy post and it still only covers a moderate amount of what you’d want to do for Azure Synapse Analytics. This is the downside to having a complex interplay of several products: there’s a lot to secure and a lot to think about along the way.

Comments closed

ADX Dashboards Now Generally Available

Michal Bar provides an overview of Azure Data Explorer functionality now generally available :

Each ADX dashboard is a collection of tiles, optionally organized in pages, where each tile has an underlying query and a visual representation. Using the web UI, you can natively export Kusto Query Language (KQL) queries to a dashboard as visuals and later modify their underlying queries and visual formatting as needed. In addition to ease of data exploration, this fully integrated Azure Data Explorer dashboard experience provides improved query and visualization performance.

Read on to learn more.

Comments closed

Object Ownership in Postgres

Ryan Booz divvies up ownership rights:

In the first security article, PostgreSQL Basics: Roles and Privileges, I discussed how roles (users and groups) are created and managed in PostgreSQL Depending on your background with permissions, particularly in other database products, some of those nuances in how permissions work may have been surprising.

Understanding how roles and privileges work in Postgres is key to understanding the next, and often more confusing part of security, object ownership. Although there are many privileges that can be assigned in Postgres, object ownership comes with a specific level of privilege that cannot be transferred to other roles. Understanding the implications of this is essential to the management of your database schema, and access to the objects it contains, over time.

Click through to learn more about how Postgres handles object ownership and defines who gets access to which objects.

Comments closed

Matrix Multiplication in R with DuckDB and SQLite

Karsten Weinert compares two databases:

On my laptop with 16 GB RAM, I would like to perform a matrix-vector multiplication with a sparse matrix of around 10 million columns and 2500 rows. The matrix has approximately only 2% non-zero entries, but this are still 500 million numbers and the column/row information, too large to work comfortably in-memory.

A while ago, I tried using sqlite for this task. It kind of worked, but was too slow to be useful. This weekend, I revisited the problem and tried using duckdb.

Read on for the results. I’ve heard enough positives about DuckDB over the past few weeks that it makes me want to try it out. H/T R-Bloggers.

Comments closed

Checking if a TCP Port Is in Use

Tom Collins knocks on doors:

Question: SQL Server won’t start , so checked Event Viewer and getting the following message. 

Server TCP provider failed to listen on [ ‘any’ <ipv4> 50010]. Tcp port is already in use.

How can I check if the port is already in use and which other process or service  has locked the port and therefore not allowing SQL Server to start on designated port ?

Read on to see how you can use Powershell to find the answer on Windows.

Comments closed