Benford’s Law

Kevin Feasel

2016-10-31

Data, R

Tomaz Kastrun is starting a series on fraud analysis and starts with Benford’s Law:

One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numbers and can be applied across any kind of problem. By naturally occurring, it is meant a number that is not generated generically such as a page number in a book, incremented number in your SQL Table, sequence number of any kind, but numbers that are occurring irrespective from each other, in nature (length or width of trees, mountains, rivers), length of the roads in the cities, addresses in your home town, city/country populations, etc. The law calculates the log distribution of numbers from 1 to 9 and stipulates that number one will occur 30% of times, number two will occur 17% of time, number three will occur 12% of the time and so on. Randomly generated numbers will most certainly generate distribution for each number from 1 to 9 with probability of 1/9. It might also not work with restrictions; for example height expressed in inches will surely not produce Benford function. My height is 188 which is 74 inches or 6ft2. All three numbers will not generate correct distribution, even though height is natural phenomena.

Tomaz includes SQL Server R Services code, so check it out.

Related Posts

Solving The Monty Hall Problem With R

Miroslav Rajter builds a Monty Hall problem simulator using R: The original and most simple scenario of the Monty Hall problem is this: You are in a prize contest and in front of you there are three doors (A, B and C). Behind one of the doors is a prize (Car), while behind others is […]

Read More

Control Table Keys In cdata

John Mount announces a new feature in the cdata package: In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31