WikiText

Various functions for accessing the WikiText datasets (WikiText-2 and WikiText-103).

dynn.data.wikitext.download_wikitext(path='.', name='2', force=False)

Downloads the WikiText from “http://www.fit.vutbr.cz/~imikolov/rnnlm

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.wikitext.load_wikitext(path, name='2', eos=None)

Loads the WikiText dataset

Returns the train, validation test set, each as a list of sentences (each sentence is a list of words)

Parameters:
  • path (str) – Path to the folder containing the wikitext-{2|103}-v1.zip file
  • name (str) – Either "2" or "103"
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

dictionary mapping the split name to a list of strings

Return type:

dict

dynn.data.wikitext.read_wikitext(split, path, name='2', eos=None)

Iterates over the WikiText dataset

Example:

for sent in read_wikitext("train", "/path/to/wikitext"):
    train(sent)
Parameters:
  • split (str) – Either "train", "valid" or "test"
  • path (str) – Path to the folder containing the wikitext-{2|103}-v1.zip files
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

list of words

Return type:

list