Phil Factor talks about data generation:
Data generation is the science and art of providing data for database development work that is as realistic and controllable as possible. The skills of generating realistic data are an essential part of being a database developer. It is important: you need plenty of data of exactly the right type, size and verisimilitude in order to develop and test a database. It has got to look right. If you are launching a corporate, data-driven application, you’ll need it for training as well as testing and development.
Where things get really tough is when you also want to get the distribution of data right. Is a uniform distribution (e.g., outputs from a random number generator) okay? Or do you have a few large accounts that take up the majority of your data? That difference could affect query plans and if you have nice and uniform data, you might never see some of the cardinality estimation problems until they pop up in production.
Comments closed