Generating Fake Data

Rich Benner shows us how to use the Faker library in Python to generate test data:

There are far more options when using Faker. Looking at the official documentation you’ll see the list of different data types you can generate as well as options such as region specific data.

Go have fun trying this, it’s a small setup for a large amount of time saved.

These types of tools can be great for generating a bunch of data but come with a couple of risks. One is that in the fake addresses Rich shows, ZIP codes don’t match their states at all, so if your application needs valid combos, it can cause issues. The other problem comes from distributions: generated data often gets created off of a uniform distribution, so you might not find skewness-related problems (e.g., parameter sniffing issues) strictly in your test data.

That said, easily generating test data is powerful and I don’t want to let the good be the enemy of the great.

Related Posts

Reviewing the Stack Overflow Developer Survey

Michael Toth looks at the recently-released 2019 Stack Overflow Developer Survey: Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite […]

Read More

K-Nearest Neighbors in Python

Hardik Jaroli shows how to use the k-Nearest Neighbors algorithm using scikit-learn: K Nearest Neighbors is a classification algorithm that operates on a very simple principle. It is best shown through example! Imagine we had some imaginary data on Dogs and Horses, with heights and weights. Training Algorithm:1. Store all the Data Prediction Algorithm:1.Calculate the […]

Read More

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728