Probabilistic Debugging

Adrian Colyer summarizes a fascinating academic paper:

This program has a bug. When given an already encoded input, it encodes it again (replacing % with ‘%25’). For example, the input https://example.com/t%20c?x=1 results in the output https://example.com/t%2520c?x=1, whereas in fact the output should be the same as the input in this case.

Let’s put our probabilistic thinking caps on and try and to debug the program. We ‘know’ that the url printed on line 19 is wrong, so we can assign low probability (0.05) to this value being correct. Likewise we ‘know’ that the input url on line 1 is correct, so we can assign high probability (0.95). (In probabilistic inference, it is standard not to use 0.0 or 1.0, but values close to them instead). Initially we’ll set the probability of every other program variable being set to 0.5, since we don’t know any better yet. If we can find a point in the program where the inputs are correct with relatively high probability, and the outputs are incorrect with relatively high probability, then that’s an interesting place!

Since url on line 19 has a low probability of being correct, this suggests that url on line 18, and purl_str at line 12 are also likely to be faulty. PI Debugger actually assigns these probabilities of being correct 0.0441 and 0.0832 respectively. Line 18 is a simple assignment statement, so if the chances of a bug here are fairly low. Now we trace the data flow. If purl_str at line 12 is likely to be faulty then s at line 16 is also likely to be faulty (probability 0.1176).

I’m interested to see someone create a practical implementation someday.

Related Posts

Using R To Hit Azure ML From Power BI

Leila Etaati shows how you can use R to hit an Azure ML endpoint to populate a data set in Power BI: You need to create a model in Azure ML Studio and create a web service for it. The traditional example in Predict a passenger on Titanic ship is going to survived or not? […]

Read More

Image Clustering With Keras And R

Shirin Glander shows us how to use R to extract learned features from Keras and cluster those features: For each of these images, I am running the predict() function of Keras with the VGG16 model. Because I excluded the last layers of the model, this function will not actually return any class predictions as it would normally […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930