Press "Enter" to skip to content

Month: September 2024

Random Walks in R with RandomWalker

Steven Sanderson is going for a walk (not the after-dinner kind):

Welcome to the world of ‘RandomWalker’, an innovative R package designed to simplify the creation of various types of random walks. Developed by myself and my co-author, Antti Rask, this package is in its experimental phase but promises to be a powerful tool for statisticians, data scientists, and financial analysts alike. With a focus on Tidyverse compatibility, ‘RandomWalker’ aims to integrate seamlessly into your data analysis workflows, offering both automatic and customizable random walk generation.

Read on to learn more about the package, including why you might want to use it and the functionality you can get out of it.

Comments closed

Cloning Tables in Databricks

Chen Hirsh hogs the photocopier:

The simplest use case to explain why table cloning is helpful is this: Let’s say you have a large table, and you want to test some new process on it, but you don’t want to ruin the data for other processes, so you need a clean copy of your table (or multiple tables) to play with. Coping a large table might take time (Databricks does it very fast, but if it’s a big table it still takes time to copy the data) ,and what happens if you then need to change your code? you have to drop the target table, copy the source table again, and so on.

here is where cloning can be your friend.

Read on to learn about three cloning techniques. H/T Madeira Data Solutions blog.

Comments closed

Change Management for the DBA

Terri Hurley moved our cheese:

If you have never been involved in Change Management processes but now find yourself part of one, it may seem a bit overwhelming or confusing. However, the reasons for and benefits of these processes are simple and straightforward.

This article will explain Change Management and how DBAs are involved and can benefit from it.

Read on to learn more about how change management can work for a DBA. I’ve worked for several organizations as they’ve moved from a philosophy of “just do it” toward proper change management, typically for regulatory reasons like Sarbanes-Oxley compliance.

Comments closed

Working with Always Encrypted Data in SSIS

Rod Edwards continues a series on Always Encrypted:

So now, lets see how it plays with another one of those common toolsets that you may use alongside your Encrypted data. In this post, i’ll be talking about accessing and importing data using SSIS, nothing fancy, just reading data from an Excel sheet, and piping into our Always Encrypted table, encrypting as we go.

I’m not saying to use Excel for housing confidential data either!… as no one does that…oh no, not anywhere, ever….</sarcasm>.

As previously, this focuses on using Azure Key Vault for securing Encryption keys required.

Considering that all corporate data is in Excel someplace (some variant of which may eventually become Feasel’s Second Law), of course that sensitive and confidential data will be in a plain Excel file that people e-mail around.

Comments closed

Checking Cumulative Update Status on SQL Server Instances

Steve Jones reminds you to check those cumulative updates:

How can I quickly get a CU patch for a system that’s out of date? I’ll discuss that situation.

You might think you get to patch every instance every few months, and you may be able to. But most of us have laggards in any decent-sized estate. Someone always wants to avoid patching, or skip patching on the day you’ve scheduled every other system.

Steve’s solution involves using Redgate Monitor, though you could also do this on your own or using something like dbatools to get information about your estate.

Comments closed

Fixing Implicit Conversion without Changing Queries

Vlad Drumea solves a challenge:

Why wouldn’t you be able to change the query?

The two most common scenarios I’ve ran into are:

  • the software vendor does not want to change the code
  • a legacy application that’s no longer maintained and nobody has access to the code base

Read on for Vlad’s solution to a fairly common problem. The real fix, of course, is to use NVARCHAR everywhere and not have to worry about VARCHAR to NVARCHAR conversion. The secondary fix is to get your queries right and make sure your data types are consistent.

Comments closed

Building a GitHub Codespace Configuration for Polyglot Notebooks

Matt Eland makes some recommendations:

In order to get Polyglot Notebooks to work with GitHub Codespaces, you’ll need to match the current requirements of the Polyglot Notebooks extension and its underlying .NET Interactive kernels.

This relies on two files in your .devcontainer directory:

  • Dockerfile which describes the Docker container the Codespace will run in
  • devcontainer.json which describes how the dev container is configured in terms of extensions and ports

Read on to learn more. Also, Matt has a brand new book available on the topic of polyglot notebooks, so check that out.

Comments closed

Schema Validation in MongoDB

Robert Sheldon makes me bite my tongue to prevent making schema quality jokes:

In the previous article in this series, I introduced you to schema validation in MongoDB. I described how you can define validation rules on a collection and how those rules validate document inserts and updates. In this article, I continue the discussion by explaining how to apply validation rules to documents that already exist in a collection. Before you start in on this article, however, I recommend that you first review the previous article for an introduction into schema validation.

The examples in this article demonstrate various concepts for working with schema validation, as it applies to existing documents. I show you how to find documents that conform and don’t conform to the validation rules, as well as how to bypass schema validation when inserting or updating a document. I also show you how to update and delete invalid documents in a collection. Finally, I explain how you can use validation options to override the default schema validation behavior when inserting and updating documents.

Read on to learn more about how you can perform some after-the-fact schema validation.

Comments closed

The State of the ANY Aggregate Transformation

Paul White covers an aggregate operator:

SQL Server provides a way to select any one row from a group of rows, provided you write the statement using a specific syntax. This method returns any one row from each group, not the minimum, maximum or anything else. In principle, the one row chosen from each group is unpredictable.

The general idea of the required syntax is to logically number rows starting with 1 in each group in no particular order, then return only the rows numbered 1. The outer statement must not select the numbering column for this query optimizer transformation (SelSeqPrjToAnyAgg) to work.

Read on for information about this internal operator, a bug that existed in it for a long time, and the current state of fixes.

Comments closed