Press "Enter" to skip to content

Month: October 2023

Exploring Poker Hands in R

Benjamin Smith sorts and deals:

Recently, I have been reading “Mathematical Statistics” by Professor Keith Knight and I noticed a interesting passage he mentions when discussing finite sample spaces:

*In some cases, it may be possible to enumerate all possible outcomes, but in general such enumeration is physically impossible; for example, enumerating all possible 5 card poker hands dealt from a deck of 52 cards would take several months under the most
favourable conditions. * (Knight 2000)

While this quote is taken out of context, with the advent of modern computing this is a task which is definitely possible to do computationally!

Click through to see how you can do this in R, at least for 5-card stud. 5-card draw would have the same number of final combinations, though if you also track intermediary combinations, it would grow rather considerably.

Comments closed

Microsoft Fabric’s Reflex as Watchdog

Tom Martens brings home a junkyard dog:

Reflex is many things next to one of the workloads of Microsoft Fabric. Before I delve into these things in more detail in later articles (yes, maybe this is the birth of another series of articles), I want to say this: Reflex is cool. It was never that simple to watch your data in your Power BI datasets (and this is only one of the capabilities of Reflex).

Because I need images whenever I try to understand things, I start with a simple image of Reflex: I consider Reflex a watchdog! Reflex is watching something and alarms me or someone else when something happens – a defined condition is met.

Read on for an example of how this works using a real dataset.

Comments closed

Postgres Performance Tuning via work_mem

Salman Ahmed explains what working memory is in Postgres and the effects of changing the work_mem value:

PostgreSQL, by default, is configured to run everywhere with minimum resource utilization. To achieve maximum performance under specific scenarios, PostgreSQL’s parameters can be tuned to enhance performance. One such parameter that can impact performance in PostgreSQL is work_mem.

In this blog we will discuss how work_mem can be used to optimize performance in PostgreSQL.

Click through for that discussion.

Comments closed

Debugging SQLPackage Issues in Powershell

Jose Manuel Jurado Diaz simplifies SQLPackage output:

Handling massive SQLPackage diagnostic logs, like those spanning over 4 million rows, can be an overwhelming task when troubleshooting support cases. This article introduces a PowerShell script designed to efficiently parse through SQLPackage diagnostic logs, extract error messages, and save them to a separate file, thus simplifying the review process and enhancing the debugging experience.

Click through for a Powershell script that can help.

Comments closed

Building Diagrams in Mermaid

Michael Bourgon tries out Mermaid:

Just found out about this the past month. 

I like diagrams for my documentation, and I detest making it. I also would like to build it via script, since that’s more useful.

I used Mermaid to create a series of architectural diagrams a couple years back. It was a reasonably good experience, although you have to keep in mind that you don’t get pixel-perfect designs and certain concepts can be difficult to represent. Even so, it’s quite alright for straightforward diagrams and includes support for icon sets for a variety of cloud and on-premises environments.

Comments closed

Caching: In-Database and External

Adron Hall talks caches:

All aboard the Data Express! Let’s imagine our database as this massive train station. The trains are packed with information – from passengers’ details to the schedules. Every time you want to know when the next train to DevOps Land is, you have to ask the station master (the database). If too many folks keep asking the same question, the station master will get tired, slowing down the whole operation. So, what do we do? Enter: Caching!

Read on for different caching mechanisms in several major relational databases, various reasons for external caches (like Redis and memcached) to exist, and four patterns for external caching. I’ve found that database people tend not to care much about external caches, leaving that to application developers. But there can be good reasons to store high-read, low-write data in caches, reducing some of the strain on those expensive database servers.

Comments closed

Making a Time Series Stationary in R

Steven Sanderson puts a halt to things:

When working with time series data, one common challenge is dealing with non-stationary data. Non-stationary time series can be a headache for analysts, but fear not, because we have a handy tool to make your life easier. Say hello to the auto_stationarize() function from the {healthyR.ts} package.

Read on to learn why you want stationary data for time series analysis and how the auto_stationarize() function works.

Comments closed

Capturing a TCP Dump in an Azure Databricks Notebook

Stithi Panigrahi does some troubleshooting:

Due to the potential impact on performance and storage costs, Azure Databricks clusters don’t capture networking logs by default. Follow the below instructions if you need to capture tcpdump to investigate multiple networking issues related to the cluster. These steps will capture a TCP dump on each cluster node–both driver and workers during the entire lifetime of the cluster.

Click through for an initiation script, which generates the actual script, which itself generates the TCP dumps.

Comments closed

Data Activator in Microsoft Fabric

Johnny Winter takes a look at Data Activator:

It activates data right? Err… not sure that’s even a thing. The one liner I’d give it, is that it acts ON your data.

The concept is that in this day and age, taking action on the insights in your data is still a very manual effort. So why not automate the monitoring of that data and have Data Activator take that action for you? In my mind it’s Microsoft’s attempt to bring Robotic Process Automation (RPA) closer to to your data.

So how does it work and what actions can you take?

That’s where you’ll have to read the whole thing—this post is just a trailer, after all.

Comments closed

Setting up Ola’s Index Maintenance with Azure Runbooks and Terraform

Josephine Bush builds on prior work:

Yes, you still need to do some work to maintain indexes in Azure SQL Database. This post will walk you through setting up statistic updates and index maintenance using Terraform.

Thanks to Tracy Boggiano for her directions for setting up the runbooks. If you want to do this manually instead of with Terraform, Tracy’s post walks you through it step by step. I only modified the role assignment so it had read to the entire subscription level to loop through every DB in the subscription.

Thanks to Kendra for blogging about index maintenance in Azure SQL. Her post helped me decide on index maintenance thresholds.

Click through for a link to Josephine’s GitHub repo and a walkthrough of how it all works.

Comments closed