Press "Enter" to skip to content

Month: December 2023

Plotting Time Series in R

Steven Sanderson builds some charts:

Our Flight Plan:

  1. Loading Up with Data: Grabbing our trusty dataset, AirPassengers.
  2. Taking Off with Base R: Creating a basic time series plot using base R functions.
  3. Soaring with ggplot2: Crafting a visually stunning time series plot using the ggplot2 library.
  4. Navigating Date Formatting: Customizing axis labels with scale_x_date() for clarity.
  5. Landing with Your Own Exploration: Encouraging you to take the controls and create your own time series plots!

Click through to see each of these steps in action.

Comments closed

Advent of Code Day 8 T-SQL Solution

Kevin Wilkie continues a series on this year’s advent of code. Day 8:

To make life a little simpler for all of us, I used a variable that I just pasted my instructions into. In my case, I called it @FollowThis allowing it to be up to 300 characters in length. Hopefully, no one’s instructions list is over that length – if so you’ll need to make that adjustment.

I also created a variable called @BackUpOfFollowThis that is strictly a copy of the original @FollowThis variable. (No muss, no fuss)

Once I place all of my inputs into the dbo.AOCDay8 table, we can easily start slicing and dicing.

And the more complex form for day 8:

First, we find out that we’re now starting with multiple positions – anything that ends with an A – and finishing anywhere that ends in a Z.

To make this happen, we’re going to want to create another table that will have all of our positions that end with the letter A.

Click through for Kevin’s solutions.

Comments closed

The Value of Data Lineage

Chisom Kanu explains why data lineage matters:

Data lineage is a component of modern data management that helps organizations understand the origins, transformations, and movement of their data. It is like a road map that shows us where our data has been, how it has changed, and where it is going, just like tracking the journey of a package: from the person who sent it (the source) to the places it passes through, and finally to the person who receives it.

The concept of data lineage has been around for many years, but it has become increasingly important in recent years due to the growth of big data and the increasing complexity of data processing systems.

Read on to learn more about data lineage.

Comments closed

DAX in Review: 2023

Marco Russo gives us a review:

Despite the large number driven by the new INFO functions, we did not see big changes in DAX, but rather a consolidation of the new window functions.

After three years, in 2023 the composite models reached the general availability! More important, the official name is now “composite models”, dropping the previous “DirectQuery for Power BI datasets and Analysis Services” name used in preview. Add to this that “datasets” have been renamed to “semantic models” (a change we fully support at SQLBI), and it seems this year we made peace with the Microsoft naming departments (there should be more than one).

Click through for more on DAX, as well as what Marco and Alberto have been up to this year.

Comments closed

Monitoring Checkpoints and the Background Writer in Postgres

Muhammad Ali keeps an eye on things:

In PostgreSQL, a checkpoint is an operation during which the database flushes/syncs all pending modifications(dirty buffers) inside memory to the actual data files on the disk.

This is important for two primary reasons. Firstly, it guarantees that all committed transactions are permanently stored, thereby safeguarding against data loss in the event of a system failure. Secondly, it works closely with the database recovery mechanism. If a crash occurs, PostgreSQL begins processing WAL logs starting from the last successful checkpoint(It gets this information from the pg_control file located in the PG data directory) during recovery. Additionally, this process allows for the fine-tuning of performance through a variety of parameters, adaptable to specific workload requirements which are discussed below.

Read on to learn more about how checkpoints work in Postgres, how the background writer works, and things to keep in mind.

Comments closed

Dynamic SQL in Stored Procedures

Erik Darling gets serious:

I’ve spent a lot of time on this blog telling you different reasons why you should, when you should, and how you should use dynamic SQL.

In this post, I’m going to go into some of the finer points of how I approach dynamic SQL to avoid issues — not performance issues — more procedural issues.

Things you should use to protect yourself from wonky object names, string truncation, object identification, and more.

Read on for Erik’s tips for handling dynamic SQL.

Comments closed

What’s New in .NET 8

Thao Nguyen keeps us up to date on .NET:

Microsoft has announced .NET 8 recently. It emphasized the cloud, performance, full-stack Blazor and .NET MAUI as major highlights of the latest edition of the company’s free, cross-platform, open-source developer platform.

Of special importance to enterprises, .NET 8 is a long-term support (LTS), which means it will be supported and patched for three years as opposed to 18 months for a standard term support (STS) release.

Click through for the list, as it’s good to keep at least one eye on the developers lest they run around unopposed.

Comments closed

Creating a Time Series in R

Steven Sanderson says it’s time:

The ts() function in R is a fundamental tool for handling time series data. It takes four main arguments:

  1. data: A vector or matrix of time series values.
  2. start: The time of the first observation.
  3. end: The time of the last observation.
  4. frequency: The number of observations per unit of time.

Read on for an example of how this all works, as well as a function in the TidyDensity package to convert data into the R time series format.

Comments closed

Eager Aggregation in SQL Queries

Boris Novikov talks about an uncommon topic:

In this article we discuss one type of query transformation that most optimizers do not use. Because of this, it can be beneficial for you to rewrite a query to help the optimizer order operations in a way that can be beneficial.

An analytical query is supposed to produce some kind of summary generalizing properties of huge amounts of data but at the same time should be compact and easy for humans to understand. In terms of the SQL query language this means that any analytical query extracts and combines large number of rows and then uses aggregate functions with or even without GROUP BY clause. More specifically, we consider queries that contain many JOIN operations followed by aggregation. Usually, queries are written in this way and, surprisingly, the optimizers choose the best order of joins but leave the aggregation as the last step.

Read on for more information, including a minor lamentation that the various relational database optimizers tend not to perform this kind of operation. In SQL Server, I have an example of this pre-aggregation using the APPLY operator (with demo code here) and a simple but realistic example of how drastic the savings can be.

Comments closed

Distributed Data in Postgres

Umair Shahid explains how postgres_fdw works:

Bridging Data Silos and Accelerating Insights

In today’s data-driven world, organizations often grapple with data residing in multiple, disparate databases. This fragmentation can hinder seamless analysis and decision-making. However, PostgreSQL offers a powerful tool to address this challenge: postgres_fdw.

What is postgres_fdw?

postgres_fdw, short for PostgreSQL Foreign Data Wrapper, is a built-in extension that allows you to seamlessly access and query data stored in external PostgreSQL databases as if it were local to your current database. This means you can create views, join tables, and perform complex queries across multiple databases without the need for manual data integration or replication.

Read on for more information about the extension.

Comments closed