Generating Realistic-Looking Data With Markov Chains

Phil Factor shows how to use Markov chain generation in T-SQL to generate realistic-looking country names:

How did we do this? We started with a table that took each word, added two spaces at the beginning and a |, followed by two subsequent spaces, at the end. This allowed us to map the frequency of each three-letter combination in a collection of words. Any language is made up of common combinations of characters with a few wild exceptions. For words to look right, they must follow this distribution. This distribution will change in various parts of a word, so you need all this information.

So what would happen if, instead of feeding the name of countries into the batch, we do the names of people?

My favorite name from the list was Kuwatian Samoa.

Related Posts

Building Data Dictionaries

Caitlin Hudon shares some great advice on building data dictionaries: The best defense I’ve found against relying on an oral history is creating a written one. Enter the data dictionary. A data dictionary is a “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”, and provides us with a […]

Read More

Master Data Services No Longer Uses Silverlight

Niko Neugebauer is happy about an update to Master Data Services in SQL Server 2019: Before we continue, let me ask you one question, have you heard about Silverlight? Or in other words, and with a kind of evil voice “DID YOU EVER INSTALLED SILVERLIGHT ON A PRODUCTION SERVER”?. If you have worked with MDS oh […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031