Press "Enter" to skip to content

Day: June 6, 2024

Book Review of Bernoulli’s Fallacy

John Mount reviews a book:

First the conclusion: this is a well researched and important book. My rating is a strong buy, and Bernoulli’s Fallacy is already influencing how I approach my work.

My initial “judge the book by its back cover” impression of Bernoulli’s Fallacy was negative. The back cover writes some very large checks that I was initially (and wrongly) doubtful that “its fists could cash.” The thesis is that frequentist statistics (the dominant statistical practice) is far worse than is publicly admitted, and that Bayesian methods are the fix. However, other reviews and the snippets by people I respect (such as Andrew Gelman and Persi Diaconis) convinced me to buy and read the book. And I am glad that I read it. The back cover was, in my revised opinion, fully justified.

Read on for John’s full review of a book that is quite critical of frequentist statistics in favor of Bayesian statistics—so that already makes the book a winner for me.

Comments closed

Distribution Parameter Wrangling in TidyDensity

Steven Sanderson introduces a new set of functions:

Greetings, fellow data enthusiasts! Today, we’re thrilled to unveil a fresh wave of functionalities in the ever-evolving TidyDensity package. Buckle up, as we delve into the realm of distribution statistics!

This update brings a bounty of new functions that streamline the process of extracting key parameters from various probability distributions. These functions adhere to the familiar naming convention util_distribution_name_stats_tbl(), making them easily discoverable within your R workflow.

Read on for the list and an example of how to use them.

Comments closed

Cleaning Up Large System Databases

Josephine Bush doesn’t need enormous system databases:

Always set this on your SQL Servers so you don’t have this problem in the first place. This is in the SQL Server Agent settings. I remember having some agent jobs that used to serve this function that ran on a schedule, which may have been required in older versions of SQL Server.

Josephine focuses on SQL Agent history and database backup history, both of which are good ones. If you have an older version of SQL Server or are using the package deployment model, there may be an explosion of information in msdb regarding SSIS that you’d want to manage. Also, check if any of the databases are in Full recovery mode; if so, ensure that the backup script you’re using for transaction log backups actually backs up system databases.

Comments closed

The securityadmin Role in SQL Server

Jeff Iannucci talks about a role that might as well be sysadmin:

Based on the name, you probably can guess that members of the securityadmin role can make dangerous changes to the permissions of other server principals. What many folks don’t realize is that this role is simultaneously less dangerous and more dangerous than you might think.

Allow me to explain, or better yet show you what that means.

Click through for the explanation.

Comments closed

Auditing a SQL Server: Discovery and Documentation

Ben Johnston begins a new series:

Inheriting a server, whether as an inexperienced user or an experienced DBA, has many challenges. It’s very helpful to evaluate the servers, document issues, and record the current configuration. It can also be beneficial to evaluate the current state of servers you have owned since they were built or even in preparation for a formal audit. The discovery and documentation phase of an audit will set you up for later detailed audits, or it may serve as the complete scope of the audit.

This is the first part of a series on evaluating and auditing SQL Server and Azure SQL Database. Auditing SQL is a very broad topic, so I have broken it down into several sections. This section will cover the major categories that should happen in a basic SQL Server discovery audit. An initial examination of your environment is primarily documentation and looking for critical issues. This includes basic server and SQL engine configuration, physical configuration items such as disk and memory, critical items such as backup state, database configuration, basic code smells, application integration, and high-level security configuration.

Read on for some of the things Ben looks at.

Comments closed

UNISTR() and || in Azure SQL Database

Abhiman Tiwari announces a new function and a new operator:

We are excited to announce that the UNISTR intrinsic function and ANSI SQL concatenation operator (||) are now available in public preview in Azure SQL Database. The UNISTR function allows you to escape Unicode characters, making it easier to work with international text. The ANSI SQL concatenation operator (||) provides a simple and intuitive way to combine characters or binary strings. These new features will enhance your ability to manipulate and work with text data. 

Click through to learn more about both. Honestly, I’d rather stick with CONCAT() versus using || because of how CONCAT() handles NULL without me having to check every operand first.

Comments closed

Getting the Top N Results in a PySpark Notebook

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed

Environment Variables in SSIS

Andy Brownsword continues a series on SSIS:

Yep it’s more SSIS again this week. Here we’ll be looking at using Environment configuration within the SSIS catalog. This allows sets of parameters to be defined and used across multiple projects and packages which share common values.

This approach can either be used as a central point for configuration, or you could use multiple configurations for the same packages.

Read on for some examples of how you might use them, as well as the process to create one.

Comments closed