2023-07-19 – Curated SQL

Finding Duplicate Rows and Values in R

Published 2023-07-19 by Kevin Feasel

Steven Sanderson de-duplicates, starting with values:

In data analysis and programming, it’s common to encounter situations where you need to identify duplicate values within a dataset. Whether you’re a beginner or an experienced programmer, knowing how to find duplicate values is a fundamental skill. In this blog post, we will explore two different approaches to accomplish this task using base R functions and the dplyr package in R. By the end, you’ll have a clear understanding of how to detect and manage duplicate values in your own datasets.

From there, we get to see various ways to de-duplicate rows in R:

In data analysis and manipulation tasks, it’s common to encounter situations where we need to identify and handle duplicate rows in a dataset. In this blog post, we will explore three different approaches to finding duplicate rows in R: the base R method, the dplyr package, and the data.table package. We’ll compare their performance using the benchmark function and provide insights on when to use each approach. So, grab your coding gear, and let’s dive in!

Duplicate values is a relatively tricky one, with rows being much easier.

Comments closed

Data Syncs between Azure SQL DB and Amazon RDS

Published 2023-07-19 by Kevin Feasel

Joey D’Antoni crosses clouds:

A while back, a client, who host user-facing databases in Azure SQL Database, had a novel problem. One of their customers, had all of their infrastructure in AWS, and wanted to be able to access my client’s data in an RDS instance. There aren’t many options for doing this–replication doesn’t work with Azure SQL Database as a publisher because there’s no SQL Agent. Managed Instance would have been messy from a network perspective, as well as cost prohibitive compared to Azure SQL DB serverless. Even using an ETL tool like Azure Data Factory would have worked, but would have required a rather large amount of dev cycles to check for changed data. Enter Azure Data Sync.

Read on to see what Azure Data Sync is and how it helps solve this problem.

Comments closed

Multi-Source Replication in MySQL

Published 2023-07-19 by Kevin Feasel

Aisha Bukar continues a series on replication in MySQL:

MySQL’s multi-source replication allows a replica server to receive data from multiple source servers. Let’s say you have a replica server at your workplace, and there are multiple source servers in different locations, you need a way to directly receive data from these source servers to your replica server. This is where the multi-source replication technique comes into play. It allows you to efficiently gather data from various sources and consolidate it on your replica server.

Note that this is quite different from merge replication or peer-to-peer replication in SQL Server and there are some limits to its capabilities. That said, I could see this being really useful for performing ELT into a warehouse: use replication to keep the staging tables in sync and then run a job to perform transformations into facts and dimensions periodically.

Comments closed

End of Month and Time Slice Functions in Snowflake

Published 2023-07-19 by Kevin Feasel

Kevin Wilkie is waiting for the calendar to change:

In SQL Server, we’re used to finding the end of the month via a few different methods. We can always use the DateAdd and DateDiff functions to get our data – which sometimes takes a bit of work – or we can use the EOMonth function.

Read on to see what tools are available for Snowflake users.

Comments closed

Migrating Column-Level Encryption to Azure SQL MI

Published 2023-07-19 by Kevin Feasel

Keshav Kiran performs a migration:

One of our customers came up with a requirement where they wanted to Migrate On-prem Database to Azure SQL Managed instance. The databases had traditional column level encryption enabled.

He has restored the database on the SQL Managed instance by Backup/Restore approach. Now when he was trying to read the encrypted column on the destination database, It was showing NULL values after decryption.

Read on for the solution.

Comments closed

Viewing the Power BI Format Pane during On-Object Interaction

Published 2023-07-19 by Kevin Feasel

Gilbert Quevauvilliers is missing something:

I have enabled the new On-Object Interaction for the formatting pane in Power BI and while it is constantly improving there are times when I would like to have the good old formatting pane available.

I have also found that sometimes when you create a new visual there is no option to format it as shown below.

There’s a workaround to this, so check it out.

Comments closed

The Internals of SQL Server Backups

Published 2023-07-19 by Kevin Feasel

Andy Yun has started a new series:

Recently, I had the pleasure of delivering a new presentation called How to Accelerate Your Database Backups for MSSQLTips.com. This blog series is intended to be a companion piece, particularly for those who prefer to read content instead of watching a video.

In this first post, Andy describes the threads and queues involved with taking a backup.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Day: July 19, 2023

Finding Duplicate Rows and Values in R

Data Syncs between Azure SQL DB and Amazon RDS

Multi-Source Replication in MySQL

End of Month and Time Slice Functions in Snowflake

Migrating Column-Level Encryption to Azure SQL MI

Viewing the Power BI Format Pane during On-Object Interaction

The Internals of SQL Server Backups