August 2023 – Curated SQL

How does this package depend on this other package? pak::pkg_deps_explain()

The pak package by Gábor Csárdi makes installing packages easier. If I need to start working on a package, I clone it, then run pak::pak() to install and update its dependencies. It’s a “convenience function” that is convenient for sure! Bye bye remotes::install_deps().

Read on for an example of this, as well as details on two other functions in different packages. H/T R-Bloggers.

Comments closed

Building Correlation Heatmaps in R

Published 2023-08-31 by Kevin Feasel

Steven Sanderson shows two packages for building heatmaps in R:

Data visualization is a powerful tool for understanding the relationships between variables in a dataset. One of the most common and insightful ways to visualize correlations is through heatmaps. In this blog post, we’ll dive into the world of correlation heatmaps using R, using the mtcars and iris datasets as examples. By the end of this post, you’ll be equipped to create informative correlation heatmaps on your own.

Read on to see how to build heatmaps with the corrplot and ggcorrplot packages.

Comments closed

Restoring a Database with Standby in SQL Server

Published 2023-08-31 by Kevin Feasel

Steve Jones stands by for station identification:

Sometimes you want to restore part of your data, but you still want the option to continue restores. A classic example of this is when you are restoring a number of transaction logs and want to check the data to find a place where certain values haven’t been changed.

Suppose someone deletes a bunch of data between 10am and 11am from the supplier table. You know that they added “Acme” to this table before the delete. You might restore up to 10am and check the supplier table for the old data and look for Acme. If it’s not there, maybe you restore the 10:05am log backup and check again. If it’s not there, then the 10:10am log, etc.

Click through to see how you can do that.

Comments closed

Executing Transactions in PostgreSQL

Published 2023-08-31 by Kevin Feasel

Salman Ahmed rolls it back:

Transactions, like any other database, are a key component of PostgreSQL. A transaction is a sequence of one or more database operations that are executed as a single unit of work. These operations can be queries (e.g. SELECT, INSERT, UPDATE and DELETE) that modify the database’s state.

A transaction’s main purpose is to combine multiple statements into an atomic, all-or-nothing process. It ensures that either all operations within a transaction are fully completed, or none of them are executed at all. Concurrent transactions cannot see each other’s unfinished changes. Updates from ongoing transactions remain hidden until completion, at which point all changes become visible simultaneously.

This is very similar to SQL Server, except their savepoints actually work they way they’re supposed to.

Comments closed

Multiple Workspaces and Microsoft Fabric Git Integration

Published 2023-08-31 by Kevin Feasel

Kevin Chant can’t stop at one:

In this post I want to cover working with Microsoft Fabric Git integration and multiple workspaces. By highlighting one method that you can use in the real-world.

I must admit that I have been very keen to test this particular way of working with Microsoft Fabric Git integration and multiple workspaces.

By the end of this post, you will know one way that you can work with Microsoft Fabric Git integration and multiple workspaces. Based on real-world working practices. Including multiple branches and pull requests.

Click through to see what Kevin has in mind usingg Azure DevOps.

Comments closed

Storing Log Analytics Data in the Microsoft Fabric Lakehouse

Published 2023-08-31 by Kevin Feasel

Gilbert Quevauvilliers needs a place to store this data:

Following on in my series, in this blog post I am going to use the dataflow Gen2 in Microsoft Fabric to load the data into a lake house table.

By doing this, it will allow me to store the data in a delta lake table.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Click through for links to the first two parts of the series, as well as a step-by-step guide for part 3.

Comments closed

Using IS DISTINCT FROM in SQL Server 2022

Published 2023-08-31 by Kevin Feasel

Chad Callihan is distinguished:

One feature introduced with SQL Server 2022 that I’ve recently been playing around with is IS [NOT] DISTINCT FROM. This new feature can help when it comes to dealing with NULL value comparisons.

Read on for examples. Do note that x IS NOT DISTINCT FROM y does not provide a performance benefit over its equivalent of x=y OR (x IS NULL AND y IS NULL).

Comments closed

Flink Streaming Use Cases for Kafka Users

Published 2023-08-30 by Kevin Feasel

Jean-Sebastien Brunner gives us some use cases:

In Part One of our “Inside Flink” blog series, we explored the critical role of stream processing and why developers are increasingly choosing Apache Flink® over other frameworks.

In this second installment, we’ll showcase how innovative teams across every industry and size are putting stream processing into practice – from streaming data pipelines to train ML models or more timely analytics to fraud detection in finance and real-time inventory management in retail. We’ll also discuss how Flink is uniquely suited to support a wide spectrum of use cases and helps teams uncover immediate insights in their data streams and react to events in real time.

This article stays more at the “art of the possible” level rather than drilling into how we can do it.

Comments closed

Structured Programming in R with Logic and Flow Control

Published 2023-08-30 by Kevin Feasel

Adrian Tam continues a primer on R:

R is a procedural programming language. Therefore, it has the full set of flow control syntax like many other languages. Indeed, the flow control syntax in R is similar to Java and C. In this post, you will see some examples of using the flow control syntax in R.

Read on for examples of flow control (if/else, for, etc.) and creating functions.

Comments closed

Visualizing when Lower is Better

Published 2023-08-30 by Kevin Feasel

Alex Velez inverts a common experience:

When quickly scanning, I wonder why the direct and indirect sales teams underperformed in 2022. Mostly, they fell below the goal of 90 days, exceeding their target only three times.

Now, pausing to think more critically about the context of this scenario, I realize I’ve misread the graph—specifically the goal line. Targets and goals are often seen as minimum thresholds, not maximum limits. But in the sales industry, the goal is to close a deal as quickly as possible. In this visual, below the goal line is actually a good thing!

This graph challenges my standard construct of targets and goals, which could lead to confusion or, worse, the wrong conclusions if I’m not careful.

Read on for five alternative ways to display this graph and (hopefully) reduce confusion.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: August 2023

Interesting R Functions for Package Dependencies and File Analysis

Building Correlation Heatmaps in R

Restoring a Database with Standby in SQL Server

Executing Transactions in PostgreSQL

Multiple Workspaces and Microsoft Fabric Git Integration

Storing Log Analytics Data in the Microsoft Fabric Lakehouse

Using IS DISTINCT FROM in SQL Server 2022

Flink Streaming Use Cases for Kafka Users

Structured Programming in R with Logic and Flow Control

Visualizing when Lower is Better