Neural Topic Models On Amazon SageMaker

David Ping, et al, show off topic modeling on Amazon SageMaker:

Topic Modeling is used to organize a corpus of documents into “topics” which is a grouping based on a statistical distribution of words within the documents themselves. Amazon Comprehend, our fully managed text analytics service, provides a pre-configured topic modeling API that is best suited for the most popular use cases like organizing customer feedback, support incidents or workgroup documents. Amazon Comprehend is the suggested topic modeling choice for customers as it removes a lot of the most routine steps associated with topic modeling like tokenization, training a model and adjusting parameters. Amazon SageMaker’s Neural Topic Model (NTM) caters to the use cases where a finer control of the training, optimization, and/or hosting of a topic model is required, such as training models on text corpus of particular writing style or domain, or hosting topic models as part of a web application. While Amazon SageMaker NTM provides a starting point of state-of-the-art topic modeling, customers have the flexibility to modify the network architecture as well as hyperparameters to accommodate the idiosyncrasies of their data sets as well as to tune the trade-off between a multitude of metrics such as document modeling accuracy, human interpretability and granularity of the learned topics, based on their applications. In addition, Amazon SageMaker NTM leverages the full power of the Amazon SageMaker platform: easily configurable training and hosting infrastructure, automatic hyperparameter optimization, and fully-managed hosting with auto-scaling.

They walk through the entire topic modeling process, so check it out.

Related Posts

Executing ML Services Scripts From Jupyter Notebooks

Kyle Weller has an inception moment with Python and SQL Server Machine Learning Services: While this example is trivial with the Iris dataset, imagine the additional scale, performance, and security capabilities that you now unlocked. You can use any of the latest open source R/Python packages to build Deep Learning and AI applications on large […]

Read More

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More


June 2018
« May Jul »