Press "Enter" to skip to content

Day: January 23, 2026

Operating on Distributions in R with distionary

Vincenzo Cola announces a new R package:

After passing through rOpenSci peer review, the distionary package is now newly available on CRAN. It allows you to make probability distributions quickly – either from a few inputs or from its built-in library – and then probe them in detail.

These distributions form the building blocks that piece together advanced statistical models with the wider probaverse ecosystem, which is built to release modelers from low-level coding so production pipelines stay human-friendly. Right now, the other probaverse packages are distplyr, allowing you to morph distributions into new forms, and famish, allowing you to tune distributions to data. Developed with risk analysis use cases like climate and insurance in mind, the same tools translate smoothly to simulations, teaching, and other applied settings.

Click through for an overview of the package.

Leave a Comment

Efficient Sampling of Spark Datasets

Rajesh Vakkalagadda needs a sample:

Sampling is a fundamental process in machine learning that involves selecting a subset of data from a larger dataset. This technique is used to make training and evaluation more efficient, especially when working with massive datasets where processing every data point is impractical

However, sampling comes with its own challenges. Ensuring that samples are representative is crucial to prevent biases that could lead to poor model generalization and inaccurate evaluation results. The sample size must strike a balance between performance and resource constraints. Additionally, sampling strategies need to account for factors such as class imbalance, temporal dependencies, and other dataset-specific characteristics to maintain data integrity.

Click through for an answer in Scala. The Python implementation would be very similar,

Leave a Comment

LOB Data and Replication in SQL Server

Mark Beaumont diagnoses an error:

Recently, one of our clients encountered an issue while running a data update in SQL Server. The operation failed immediately with a configuration error, specifically targeting Large Object (LOB) data:

Length of LOB data (169,494) to be replicated exceeds configured maximum 65,536. Use the stored procedure sp_configure to increase the configured maximum value for max text repl size option, which defaults to 65,536. A configured value of -1 indicates no limit, other than the limit imposed by the data type.

The tricky part was, that client wasn’t using replication. Read on to learn about the culprit.

Leave a Comment

A Primer on Cognitive Perception

Paul Turley thinks about how we think:

You can be the greatest report designer on the planet, but if your report doesn’t meet the needs of the report consumer, it’s all for nothing. In this section, I break down the most important considerations for identifying your audience and their information needs. These are all factors to consider before you jump in and start designing your report.

Paul hits on quite a few of the foundational concepts around how humans visual stimuli and tells some interesting stories along the way.

Leave a Comment

Granular REST API Support for OneLake Security Role Management

Aaron Merrill announces a new preview offering:

Microsoft Fabric continues to expand the OneLake security surface with new granular REST API support for role management, giving developers and platform teams far more control over how security policies are created, retrieved, and managed programmatically. In addition to the existing batch role API, Fabric now offers discrete Create, Get, and Delete role APIs, making it easier to build incremental, automation-friendly security workflows that align with modern DevOps and governance practices.

Click through for a quick explanation of how things did work and how they will work going forward.

Leave a Comment

Tips for Teaching Technical Topics

John Deardurff shares some advice:

After 25 years as a Microsoft Certified Trainer (MCT), one thing I have learned is that teaching technical content requires more than just subject‑matter expertise. Great technical instructors create an environment where learners feel comfortable, engaged, and motivated to explore complex concepts at their own pace. 

Click through for ten such tips. I tend to follow seven of them pretty well, though the three around questions are where I’m weakest.

Leave a Comment