Have you ever been tasked with creating test data for an application and then ran into performance problems once the application moves to production?
Many of us manage databases or applications that contain regulated data that can’t leave a production environment. This means that we need to “clean” the data if it’s going to be used in QA or development work and one common way to de-identify the data is to simply update columns like firstname and lastname with a simple format “firstname” + counter; however, this results in all the data being unique and sequential. Firstname1, firstname2, firstname3, …
This isn’t good for getting like for like results with a production database and can lead to questions we’ve heard before in the workplace like “Why didn’t we catch this in QA?”
This works reasonably well, though you’d want to be sure to seed in edge cases and the like. But if you just need to generate some realistic-ish data pretty quickly, this is one option that can work.