Penn TreeBank

Various functions for accessing the PTB dataset used by Mikolov et al., 2010.

dynn.data.ptb.download_ptb(path='.', force=False)

Downloads the PTB from “http://www.fit.vutbr.cz/~imikolov/rnnlm

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.ptb.load_ptb(path, eos=None)

Loads the PTB dataset

Returns the train and test set, each as a list of images and a list of labels. The images are represented as numpy arrays and the labels as integers.

Parameters:
  • path (str) – Path to the folder containing the simple-examples.tar.gz file
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

dictionary mapping the split name to a list of strings

Return type:

dict

dynn.data.ptb.read_ptb(split, path, eos=None)

Iterates over the PTB dataset

Example:

for sent in read_ptb("train", "/path/to/ptb"):
    train(sent)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the simple-examples.tar.gz file
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

tree, label

Return type:

tuple