IWSLT¶
Various functions for accessing the IWSLT translation datasets
-
dynn.data.iwslt.
download_iwslt
(path='.', year='2016', langpair='de-en', force=False)¶ Downloads the IWSLT from “https://wit3.fbk.eu/archive/”
Parameters:
-
dynn.data.iwslt.
load_iwslt
(path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Loads the IWSLT dataset
Returns the train, dev and test set, each as lists of source and target sentences.
Parameters: - path (str) – Path to the folder containing the
.tgz
file - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgt
language pair (for now only{de,fr}-en
are supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: train, dev and test sets
Return type: - path (str) – Path to the folder containing the
-
dynn.data.iwslt.
read_iwslt
(split, path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Iterates over the IWSLT dataset
Example:
for src, tgt in read_iwslt("train", "/path/to/iwslt"): train(src, tgt)
Parameters: - split (str) – Either
"train"
,"dev"
or"test"
- path (str) – Path to the folder containing the
.tgz
file - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgt
language pair (for now only{de,fr}-en
are supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: Source sentence, Target sentence
Return type: - split (str) – Either