Press "Enter" to skip to content

Month: November 2023

Fitting Distributions to Datasets in R

Steven Sanderson tests a distribution fit:

There are two main ways to fit a gamma distribution to a dataset in R:

  1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
  2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

Click through to see which technique Steven uses and an example of how it all works.

Comments closed

From Probabilities to Odds

Bryan Shalloway explains how odds and probabilities intertwine:

However human understanding of odds predates our formal understanding of probability. You can find references to odds dating back to Shakespeare:

Knew that we ventured on such dangerous seas
That if we wrought out life ’twas ten to one;
– Shakespeare’s Henry IV, Part II, 1597

Yet, in most common settings, modern society has largely supplanted odds for probabilities. You can imagine if Shakespeare were writing today the line might end “’twas ten out of eleven.

Read the whole thing.

Comments closed

Creating and Connecting to an Azure Postgres Cluster

Louis Davidson shares some notes:

As I have dealt with other platforms, PostgreSQL has stood out to me as the platform I am most interested in because it feels like the one that is most competitive with SQL Server’s platform (Oracle is out there too, as is MySQL, and many others, but PostgreSQL feels like the balance of affordability and features that it has a similar feel enough to get started.)

There are a few high-level differences that can be confusing. A cluster is really just a server (or in SQL Server, an instance). Second, the way you execute a batch of code is very different, and sometimes this is based on the tool you are using. As you dig into how PostgreSQL works, some things will feel really normal, and some stuff will be very different from the other servers you have used.

Read on for the first post in the series, covering setup and connection.

Comments closed

Building Connect 4 with T-SQL

Tomaz Kastrun figures out a useful way to play a game while looking like you’re doing important work:

Connect 4 is a classical board game, consisting of a board of 6 rows and 7 columns and 42 tokens (of two colours, each player with 21). The first player to get four tokens in a row, column or diagonally, wins!

Simple game rules:
1. only one token can be added at the time
2. each player have their own colour (sign) of tokens
3. the game is ended, when the first player connects four tokens in a row, column or diagonally

Actually, I take that back: Tomaz has figured out a useful way for two people to play a game while looking like they’re doing important work. That’s twice as good.

Comments closed

Notes for Trying out Postgres

Ryan Booz offers some guidance:

PostgreSQL continues to be all the rage in 2023, whether in “vanilla” form of the fully open-source distribution or a variant like Amazon RDS, Neon, Yugabyte, and others. If you’re interested in trying PostgreSQL but only have experience with another database like SQL Server, it can feel a bit daunting to get started.

In this small series, we’ll walk you through the process of connecting to a version of PostgreSQL locally or in the cloud. Next, we’ll discuss some options for sample databases and show you how to restore them. And finally, we’ll finish by demonstrating a few ways to connect to and query the database.

Click through for the article.

Comments closed

Run-Length Encoding in Power BI

Chris Webb performs some experiments:

Recently I was involved in an interesting discussion on Twitter X about how partitioning a table in a Power BI dataset semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

The results are interesting and Chris explains why we see what we do.

Comments closed

Deploying SQL Server with Availability Groups via HPE Serviceguard for LInux and Ansible

Amit Khandelwal packs a lot into a post:

It’s time for a new blog on how Ansible can simplify SQL Server deployment, configuration, and availability. If you’ve read my previous blogs on Ansible for SQL Server installation and configuration, and the pacemaker-based Always On availability group, you know how powerful Ansible can be. Now, let’s talk about HPE Serviceguard for Linux (SGLX), a high availability/disaster recovery solution that provides business continuity for critical applications like SQL Server.

Deploying SQL Server Always On availability groups on HPE SGLX is a fully supported solution for production workloads.

Today, let’s look at how you can configure Always On availability group based on HPE SGLX via Ansible. We have collaborated with our friends in HPE to enable the Ansible based deployment for HPE SGLX with SQL Server. This feature is now available for HPE SGLX 15.10. For this demonstration, you can download the evaluation bits from the ‘My HPE Software Center‘. The Ansible bits with the scripts are available on GitHub

Read on for instructions and what you need to make it all work.

Comments closed

Triangular Distributions and R

Steven Sanderson unleashes the power of the triangle:

The Triangular Distribution is a continuous probability distribution with a triangular shape, hence the name. It is defined by three parameters: minmax, and mode. These parameters determine the range of values the distribution can take and the most likely value within that range. In mathematical terms, the probability density function (PDF) of the Triangular Distribution is given by:

Read on to see the definition, as well as how you can use the four functions around the Triangular Distribution.

By the way, the best-known case of the Triangular Distribution is combining the results of two fair dice, which gives us a peak at the number 7 (1/6 of the time) for a pair of fair, six-sided dice and moving symmetrically down from there, so p(6) = p(8), p(5) = p(9), and so on.

Comments closed

Optimizing Shared Buffers in Postgres

Salman Ahmed explains how shared buffers work in Postgres:

PostgreSQL is known for its robustness and performance right out of the box. However, different applications and different scales of data demand fine-tuning of various parameters to achieve optimum performance. One such significant parameter is shared_buffers, which, when configured correctly, can notably enhance the performance of your PostgreSQL database.

Read on to see what shared buffers are, why they are so important to Postgres, and how to figure out the right value to use.

Comments closed

Automated Alerts for Query Store Usage in Azure SQL DB

Jose Manuel Jurado Diaz doesn’t want to run out of disk space:

In this article, we will introduce a PowerShell script that helps monitor the usage of Query Data Store (QDS) in SQL Server databases. The script automatically sends an email alert when the storage space used by the Query Data Store reaches or exceeds 80% of its maximum allocated space. This can be particularly useful for database administrators to proactively manage and optimize their database storage.

Before we dive into the details, it’s essential to note that the script provided is a recommendation and should be adapted according to your specific environment and requirements. The responsibility of ensuring its compatibility and the impact on your system lies solely with the user.

Click through for the script. A quick glance of the code makes me think it will also work with on-premises SQL Server.

Comments closed