IWSLT

Various functions for accessing the IWSLT translation datasets

dynn.data.iwslt.download_iwslt(path='.', year='2016', langpair='de-en', force=False)

Downloads the IWSLT from “https://wit3.fbk.eu/archive/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.iwslt.load_iwslt(path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')

Loads the IWSLT dataset

Returns the train, dev and test set, each as lists of source and target sentences.

Parameters:
  • path (str) – Path to the folder containing the .tgz file
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • src_eos (str, optional) – Optionally append an end of sentence token to each source line.
  • tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns:

train, dev and test sets

Return type:

tuple

dynn.data.iwslt.read_iwslt(split, path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')

Iterates over the IWSLT dataset

Example:

for src, tgt in read_iwslt("train", "/path/to/iwslt"):
    train(src, tgt)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the .tgz file
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • src_eos (str, optional) – Optionally append an end of sentence token to each source line.
  • tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns:

Source sentence, Target sentence

Return type:

tuple