Press "Enter" to skip to content

Detecting Hard-to-Classify Data

Kaushal Mukherjee takes us through a new Python package:

The article explains the algorithm behind the recently introduced Python package named PyHard, based on the concept of Instance Space Analysis. It helps in assessing the quality of a dataset and identifying what are the instances which are hard/easy to classify. With the help of this algorithm we can separate out noisy instances. It also provides an interactive visualization tool to deep dive into the instance space.

Click through for the details. I’m going to wait for PyHard 2: PyHarder. Or maybe PyHardWithAVengeance. But it’ll all go downhill by the time we get to PyHard 5.