More Isn’t Better With Data Collection

Andy Leonard argues that more data is not better data:

The Problem I am Trying To Solve

Is more data better? In his 2012 book, Antifragile, Nassim Nicholas Taleb (fooledbyrandomness.com | @nntaleb) – the first data philosopher I encountered – states:

“The fooled-by-data effect is accelerating. There is a nasty phenomenon called ‘Big Data’ in which researchers have brought cherry-picking to an industrial level. Modernity provides too many variables (but too little data per variable), and the spurious relationships grow much, much faster than real information, as noise is convex and information is concave.” – Nassim Nicholas Taleb, Antifragile, p. 416

According to Taleb, there’s a bias for error embedded in big data; more is not better, it’s worse. I’ve experienced this with business intelligence solutions and spoken about data quality in data warehouse solutions, saying:

“The ratio of good:bad data in a useless / inaccurate data warehouse is surprisingly high; almost always north of 95% and often higher than 99%.”

Taleb states more data includes a disproportionate amount of bad data, and that bigger data results in more spurious correlations. In other words, more is not better – it’s worse.

It’s an idea worth grappling with.  The other side of the argument is that for some problems, you won’t know what you need until you need it.

Related Posts

Updating Large Tables In SQL Server And Oracle

Jana Sattainathan has a post on how he was able to move and update billions of rows, using both Oracle and SQL Server as examples: The key thing to remember with SQL Server is to convert to a non-integer value by using a “decimal” as shown in the above example with “10.”. This is the same […]

Read More

The Importance Of Action

Kevin Feasel

2017-02-28

Data

Jesse Seymour has relaunched his blog and started with a controversial statement: There is no value in data. If you’re still here, then I am assuming you either a) believe I have a valid point, or b) just want to see how crazy I am for opening my new data blog with a post spouting […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930