Cache
How to use Cache
Cacheis typically used as a decorator function@cachearound functions or regular class methodsCacheworks in code and in Python notebooks with%autoreload
How the Cache works
Cachetracks changes in the source code of the wrapped function-
For performance reasons, it checks the code only one time unless the pointer to the function is changed, e.g. in notebooks
-
By default, it uses two levels of caching:
Memorylevel-
Disklevel -
When a call is made to the wrapped function:
- Firstly the
Memorylevel is being checked - If there's no hit in the
Memory, theDisklevel is checked - If there's no hit in
Disklevel, the wrapped function is called -
The result is then stored in both
DiskandMemorylevels -
Cacheis equipped with aget_last_cache_accessed()method to understand if the call hit the cache and on which level
Disk level
Disklevel is implemented via joblib.Memory
Memory level
-
Initially, the idea was to use functools.lru_cache for memory cache
-
Pros:
-
Standard library implementation
-
Quietly fast in-memory implementation
-
Cons:
-
Only hashable arguments are supported
- No access to cache, it's not possible to check if an item is in cache or not
-
It does not work properly in notebooks
-
Because Cons outweighed Pros, we decided to implement
Memorylevel as joblib.Memory overtmpfs - In this way we reuse the same code for
Disklevel cache but over a RAM-based disk - This implementation overcomes the Cons listed above, although it is slightly
slower than the pure
functools.lru_cacheapproach
Global cache
- By default, all cached functions save their cached values in the default global cache
- The cache is "global" in the sense that:
- It is unique per-user and per Git client
- It serves all the functions of a Git client
- The cached data stored in a folder
$GIT_ROOT/tmp.cache.{mem,disk}.[tag] -
This global cache is being managed via global functions named
*_global_cache, e.g.,set_global_cache() -
TODO(gp): maybe a better name is
- Global -> local_cache, client_cache
- Function_specific -> global or shared
Tagged global cache
- A global cache can be specific of different applications (e.g., for unit tests vs normal code)
- It is controlled through the
tagparameter - The global cache corresponds to
tag = None
Function-specific cache
- It is possible to create function-specific caches, e.g., to share the result of a function across clients and users
- In this case the client needs to set
disk_cache_directoryand / ormem_cache_directoryparameters in the decorator or in theCachedclass constructor - If cache is set for the function, it can be managed with
.set_cache_directory(),.get_cache_directory(),.destroy_cache()and.clear_function_cache()methods.