Measuring Semantic Relatedness

Sandipan Dey re-works a university assignment on semantic relatedness in Python:

Let’s define the semantic relatedness of two WordNet nouns x and y as follows:

  • A = set of synsets in which x appears
  • B = set of synsets in which y appears
  • distance(x, y) = length of shortest ancestral path of subsets A and B
  • sca(x, y) = a shortest common ancestor of subsets A and B

This is the notion of distance that we need to use to implement the distance() and sca() methods in the WordNet data type.

It looks like a helpful assignment for understanding natural language processing a little better.

Related Posts

Reviewing The Team Data Science Process

I am starting a new series on launching a data science project, and my presentation quickly veers into a pessimistic place: The concept of “clean” data is appealing to us—I have a talk on the topic and spend more time than I’m willing to admit trying to clean up data.  But the truth is that, in a […]

Read More

Methods To Improve Model Accuracy

Tristan Robinson shows how to go back to the drawing board when your model’s accuracy isn’t cutting it: One of the reoccurring principles that appears with machine learning is that of Ockham’s razor, which states that the best models are simple models that fit the data well; this is not an irrefutable principle of logic, but […]

Read More


October 2017
« Sep Nov »