Functions for Dataset Caching

dynn.data.caching.cached_to_file(filename)

Decorator to cache the output of a function to a file

Sometimes your workflow will contain functions that are executed once but take a lot of time (typically data preprocessing). This can be annoying when eg. running multiple experiments different parameters. This decorator provides a solution by running the function once, then saving its output to a file. Next time you called this function, and unless the file in question has been deleted, the function will just read its result from the file instead of recomputing everything.

Caveats: - By default if you call the decorated function with different arguments,

this will still load the cached output from the first function call with the original arguments. You need to add the update_cache=True keyword argument to force the function to be rerun. Incidentally the decorated function should not have an argument named update_cache.
  • The serialization is done with pickle, so:
    1. it isn’t super secure (if you care about these things)
    2. it only handles functions where the outputs can be pickled (for now). Typically this wouldn’t work for dynet objects.

Example usage:

@cached_to_file("preprocessed_data.bin")
def preprocess(raw_data):
    # do a lot of preprocessing

# [...] do something else

# This first call will run the function and pickle its output to
# "preprocessed_data.bin" (and return the output)
data = preprocess(raw_data)

# [...] do something else, or maybe rerun the program

# This will just load the output from "preprocessed_data.bin"
data = preprocess(raw_data)

# [...] do something else, or maybe rerun the program

# This will force the function to be rerun and the cached output to be
# updated. You should to that if for example the arguments of
# `preprocess` are expected to change
data = preprocess(raw_data, update_cache=True)
Parameters:filename (str) – Name of the file where the cached output should be saved to.