Non-English Natural Language Processing

The folks at BNOSAC have announced a new natural language processing toolkit for R:

BNOSAC is happy to announce the release of the udpipe R package ( which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format as provided at

The package provides direct access to language models trained on more than 50 languages.

Click through to check it out.

Related Posts

ElasticMapReduce And RStudio

Tanzir Musabbir demonstrates how to set up Amazon ElasticMapReduce to include an RStudio edge node: RStudio Server provides a browser-based interface for R and a popular tool among data scientists. Data scientist use Apache Spark cluster running on  Amazon EMR to perform distributed training. In a previous blog post, the author showed how you can install RStudio Server on Amazon […]

Read More

Mutating Data Frames Without dplyr

John Mount points out that there is a built-in function to mutate data frames in R: The notation we used above is the “explicit argument” variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform(). To demonstrate this, let’s first detach dplyr to show that […]

Read More


January 2018
« Dec Feb »