Data Cleansing In SQLite

Allison Tharp wants to clear out kinda-sorta duplicates from a SQLite table:

However, now I have a lot of database entries that are unneeded.  I thought I would take the time to clean this up (even though I’ll no longer use the data and could easily just delete the tables).  For the BGG Hotness, I have the tables: hotgame, hotperson, and hotcompany.  I have 7,350 rows in each of those tables, since I collected data on 50 rankings every hour for just over 6 days.  However, since the BGG hotness rankings only update daily, I really only need 300 rows (50 rankings * 6 days = 300 rows).

I know think the rankings update between 3 and 4, so I want to only keep the entries from 4:00 AM.  I use the following SELECT statement to make sure I’m in the ballpark with where the data is that I want to keep:

There are several ways to solve this problem; this one is easy and works.  The syntax won’t work for all database platforms, but does the trick for SQLite.

Related Posts

Using The Spark Connector To Speed Up Data Loads

Denzil Riberio explains how you can use the Spark connector for Azure SQL DB and SQL Server to speed up inserting data from Spark into SQL Server 15x over the native JDBC client: Since the load was taking longer than expected, we examined the sys.dm_exec_requests DMV while load was running, and saw that there was a fair […]

Read More

Automatic Retry With Optimistic Concurrency

Vladimir Khorikov explains an anti-pattern when dealing with a model using optimistic concurrency (for example, memory-optimized tables): Alright, back to the original question. So, how to combine optimistic locking and automatic retry? In other words, when the application gets an error from the database saying that the versions of a Product don’t match, how to […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031