IWSLT¶
Various functions for accessing the IWSLT translation datasets
-
dynn.data.iwslt.download_iwslt(path='.', year='2016', langpair='de-en', force=False)¶ Downloads the IWSLT from “https://wit3.fbk.eu/archive/”
Parameters:
-
dynn.data.iwslt.load_iwslt(path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Loads the IWSLT dataset
Returns the train, dev and test set, each as lists of source and target sentences.
Parameters: - path (str) – Path to the folder containing the
.tgzfile - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgtlanguage pair (for now only{de,fr}-enare supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: train, dev and test sets
Return type: - path (str) – Path to the folder containing the
-
dynn.data.iwslt.read_iwslt(split, path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Iterates over the IWSLT dataset
Example:
for src, tgt in read_iwslt("train", "/path/to/iwslt"): train(src, tgt)
Parameters: - split (str) – Either
"train","dev"or"test" - path (str) – Path to the folder containing the
.tgzfile - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgtlanguage pair (for now only{de,fr}-enare supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: Source sentence, Target sentence
Return type: - split (str) – Either