Press "Enter" to skip to content

Day: July 19, 2023

Finding Duplicate Rows and Values in R

Steven Sanderson de-duplicates, starting with values:

In data analysis and programming, it’s common to encounter situations where you need to identify duplicate values within a dataset. Whether you’re a beginner or an experienced programmer, knowing how to find duplicate values is a fundamental skill. In this blog post, we will explore two different approaches to accomplish this task using base R functions and the dplyr package in R. By the end, you’ll have a clear understanding of how to detect and manage duplicate values in your own datasets.

From there, we get to see various ways to de-duplicate rows in R:

In data analysis and manipulation tasks, it’s common to encounter situations where we need to identify and handle duplicate rows in a dataset. In this blog post, we will explore three different approaches to finding duplicate rows in R: the base R method, the dplyr package, and the data.table package. We’ll compare their performance using the benchmark function and provide insights on when to use each approach. So, grab your coding gear, and let’s dive in!

Duplicate values is a relatively tricky one, with rows being much easier.

Comments closed

Data Syncs between Azure SQL DB and Amazon RDS

Joey D’Antoni crosses clouds:

A while back, a client, who host user-facing databases in Azure SQL Database, had a novel problem. One of their customers, had all of their infrastructure in AWS, and wanted to be able to access my client’s data in an RDS instance. There aren’t many options for doing this–replication doesn’t work with Azure SQL Database as a publisher because there’s no SQL Agent. Managed Instance would have been messy from a network perspective, as well as cost prohibitive compared to Azure SQL DB serverless. Even using an ETL tool like Azure Data Factory would have worked, but would have required a rather large amount of dev cycles to check for changed data. Enter Azure Data Sync.

Read on to see what Azure Data Sync is and how it helps solve this problem.

Comments closed

Multi-Source Replication in MySQL

Aisha Bukar continues a series on replication in MySQL:

MySQL’s multi-source replication allows a replica server to receive data from multiple source servers. Let’s say you have a replica server at your workplace, and there are multiple source servers in different locations, you need a way to directly receive data from these source servers to your replica server. This is where the multi-source replication technique comes into play. It allows you to efficiently gather data from various sources and consolidate it on your replica server.

Note that this is quite different from merge replication or peer-to-peer replication in SQL Server and there are some limits to its capabilities. That said, I could see this being really useful for performing ELT into a warehouse: use replication to keep the staging tables in sync and then run a job to perform transformations into facts and dimensions periodically.

Comments closed

Migrating Column-Level Encryption to Azure SQL MI

Keshav Kiran performs a migration:

One of our customers came up with a requirement where they wanted to Migrate On-prem Database to Azure SQL Managed instance. The databases had traditional column level encryption enabled.

He has restored the database on the SQL Managed instance by Backup/Restore approach. Now when he was trying to read the encrypted column on the destination database, It was showing NULL values after decryption.

Read on for the solution.

Comments closed

Viewing the Power BI Format Pane during On-Object Interaction

Gilbert Quevauvilliers is missing something:

I have enabled the new On-Object Interaction for the formatting pane in Power BI and while it is constantly improving there are times when I would like to have the good old formatting pane available.

I have also found that sometimes when you create a new visual there is no option to format it as shown below.

There’s a workaround to this, so check it out.

Comments closed