nltk (Natural Language Toolkit)

Table of Contents

Setup

Supporting Libraries / Dependencies

To work with the following data sources in Pandas, verify that the Python environment has the supporting libraries installed either via pip or through the Anaconda distribution.

Required Libraries:

  • regex
  • tqdm

Documentation

Natural Language Toolkit - NLTK 3.5 documentation

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

Natural Language Toolkit - NLTK 3.5 documentation

NLTK Downloader Shell

Process

Inputting the following into the Jupyter notebook accesses the NLTK Downloader Shell.

nltk.download_shell()

The input is defined by the keyboard shortcuts noted in its menu. Use l for List to access the available Packages & Collections noted below. To download a specific package, first enter d for Download and then the name of the intended package, such as stopwords in this case. Once it has downloaded and installed to the noted directory, it will show up in the list with an * noting that it has already been installed.

Packages & Collections