Amazon elec dataset

Various functions for accessing the Amazon Reviews dataset.

dynn.data.amazon.download_amazon(path='.', force=False)

Downloads the Amazon from “http://riejohnson.com/software/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.amazon.load_amazon(path, tok=True, size='200k')

Loads the Amazon dataset

Returns the train, dev and test sets in a dictionary, each as a tuple of containing the reviews and the labels.

Parameters:path (str) – Path to the folder containing the elec2.tar.gz file
Returns:
Dictionary containing the train and test sets
(dictionary of review/labels tuples)
Return type:dict
dynn.data.amazon.read_amazon(split, path, tok=True, size='200k')

Iterates over the Amazon dataset

Example:

for review, label in read_amazon("train", "/path/to/amazon"):
    train(review, label)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the elec2.tar.gz files
Returns:

review, label

Return type:

tuple