Data Quality

Kevin Feasel



Milind Paradkar discusses clean data:

We decided to do a quick check and took a sample of 143 stocks listed on the National Stock Exchange of India Ltd (NSE). For these stocks, we downloaded the 1-minute intraday data for the period 1/08/2016 – 19/08/2016. The aim was to check whether Google finance captured every 1-minute bar during this period for each of the 143 stocks.

NSE’s trading session starts at 9:15 am and ends at 15:30 pm IST, thus comprising of 375 minutes. For 14 trading sessions, we should have 5250 data points for each of these stocks. We wrote a simple code in R to perform the check.

I like this post because it exposes a data quality issue people don’t tend to think about very often:  when all of the data is legitimate and correctly-structured, but there are gaps in the available data set.  This is one of many data quality problems you’ll run into, so it may be important to have a plan in place in case you hit this scenario.

Related Posts

Taking A Random Walk

Dan Goldstein describes the basics of Brownian motion: I was sitting in a bagel shop on Saturday with my 9 year old daughter. We had brought along hexagonal graph paper and a six sided die. We decided that we would choose a hexagon in the middle of the page and then roll the die to […]

Read More

Introduction To Neural Nets

Ben Gorman has a two-part series introducing neural networks.  First, the basics behind neural networks: We can solve both of the above issues by adding an extra layer to our perceptron model. We’ll construct a number of base models like the one above, but then we’ll feed the output of each base model as input into another perceptron. This […]

Read More


September 2016
« Aug Oct »