Stanford Sentiment TreeBank

Various functions for accessing the SST dataset.

dynn.data.sst.download_sst(path='.', force=False)

Downloads the SST from “https://nlp.stanford.edu/sentiment/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.sst.load_sst(path, terminals_only=True, binary=False)

Loads the SST dataset

Returns the train, dev and test sets in a dictionary, each as a tuple of containing the trees and the labels.

Parameters:
  • path (str) – Path to the folder containing the trainDevTestTrees_PTB.zip file
  • terminals_only (bool) – Only return the terminals and not the tree
  • binary (bool) – Binary SST (only positive and negative labels). Neutral lables are discarded
Returns:

Dictionary containing the train, dev and test sets

(tuple of tree/labels tuples)

Return type:

dict

dynn.data.sst.read_sst(split, path, terminals_only=True, binary=False)

Iterates over the SST dataset

Example:

for tree, label in read_sst("train", "/path/to/sst"):
    train(tree, label)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the trainDevTestTrees_PTB.zip files
  • terminals_only (bool) – Only return the terminals and not the tree
  • binary (bool) – Binary SST (only positive and negative labels). Neutral lables are discarded
Returns:

tree, label

Return type:

tuple