Press "Enter" to skip to content

Day: September 8, 2023

Finding Omitted Variables in Logistic Regression

John Mount picks up on a prior post:

For this note, let’s work out how to directly try and overcome the omitted variable bias by solving for the hidden or unobserved detailed data. We will work our example in R. We will derive some deep results out of a simple set-up. We show how to “un-marginalize” or “un-summarize” data.

This is an interesting dive into a common problem, and something which we can easily work around in linear regression, but not in logistic regression.

Comments closed

Reviewing Postgres SQL Statement History

Muhammad Ali does some performance testing:

pg_stat_statements is a built-in PostgreSQL extension that keeps track of SQL statements executed by the database server. It records details about query execution counts, total execution time and I/O related info. By capturing this information, database administrators can identify bottlenecks and optimize queries for better overall performance. In this blog, we’ll take a  look at how pg_stat_statements module can be utilized to enhance the performance of your PostgreSQL database.  

Read on to see how you can enable this extension, configure it, and use it to find the worst performers in the bunch.

Comments closed

Creating a Function in Snowflake

Kevin Wilkie creates a function:

Sometimes, you’ll need to create functions that do a particular thing. They’re not always pretty. Some of them look like they should have been thrown out with the bathwater.

Unlike SQL Server, user-defined functions in Snowflake can be done in several different languages. They can be done in Java, JavaScript, Python, Scala, or plain-old SQL.

Which means that Postgres users will be a bit more comfortable here than SQL Server users.

Comments closed

Choosing a Data Serialization Format

Rathish Kumar says more than “JSON and Parquet”:

In the world of software, we often work with different types of data like lists, tables, and more. These data structures are designed to be fast and efficient when our computer programs use them. However, sometimes we need to move this data out of our computer’s memory, like when we want to save it to a file or send it over the internet. To do this, we have to change the data into a special format made up of 0s and 1s, which is quite different from data structures. This process is what we call encoding or serialization. 

In this article, we’ll explore the world of encoding and decoding, which is the reverse process of turning that special format back into usable data. We’ll also take a look at different ways to do encoding and decoding, as well as important things to think about when choosing the right method for your software projects.

Sadly, ORC (Optimized Row Columnar) doesn’t make the cut, as Parquet ended up taking over that market.

Comments closed

A Primer on Database Constraints in MySQL

Robert Sheldon creates some keys:

MySQL supports six basic types of constraints for ensuring data integrity: PRIMARY KEYNOT NULLDEFAULTCHECKUNIQUE, and FOREIGN KEY. In this article, I introduce you to each constraint type and provide examples for how they work. The examples include a series of CREATE TABLE statements that demonstrate different ways to incorporate constraints into your table definitions. If you’re not familiar with the CREATE TABLE statement or how to create tables in a MySQL database, refer to an earlier article in this series that introduces you to the statement.

In short, they support the same set that SQL Server users are used to. But do read on to see the nuances behind each of these.

Comments closed

Don’t Try These with SQL MI and Private Endpoints

Zoran Rilak wraps up a series on Azure SQL Managed Instance and its support for private endpoints:

The first two installments of this mini-series discussed a couple of basic and advanced scenarios involving private endpoints. Today we’ll look at some ways private endpoints cannot be used to implement scenarios where one might expect otherwise.

Read on for four of these in total, laying out things you cannot do via private endpoint to a SQL Managed Instance. In fairness, Zoran also provides what I would consider reasonable work-arounds for each of those: have a VM jumpbox in the same virtual network for DAC connections, peer your virtual networks for replication, and so on.

Comments closed