dol
snippets
#44
Replies: 7 comments 1 reply
-
Copy data from one store to anothersrc = ... # make source store
targ = ... # make target store
targ.update(src) Note that here, targ will use the keys and values of src as they're provided. |
Beta Was this translation helpful? Give feedback.
-
Copy pickled data into gzipped csv dataWe'll work with dictionaries instead of files here, so we can test more easily. Say you have a source backend that has pickles of some lists-of-lists-of-strings, using the import pickle
src_backend = {
'file_1.pkl': pickle.dumps([['A', 'B', 'C'], ['one', 'two', 'three']]),
'file_2.pkl': pickle.dumps([['apple', 'pie'], ['one', 'two'], ['hot', 'cold']]),
}
targ_backend = dict() Here's how you can do it: from dol import ValueCodecs, KeyCodecs, Pipe
# decoder here will unpickle data and remove remove the .pkl extension from the key
src_wrap = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())
# encoder here will convert the lists to csv string, the string into bytes, and the bytes will be gzipped.
# ... also, we'll add .csv.gz on write.
targ_wrap = Pipe(
KeyCodecs.suffixed('.csv.gz'),
ValueCodecs.csv() + ValueCodecs.str_to_bytes() + ValueCodecs.gzip()
)
# Let's wrap our backends:
src = src_wrap(src_backend)
targ = targ_wrap(targ_backend)
# and copy src over to targ
print(f"Before: {list(targ_backend)=}")
targ.update(src)
print(f"After: {list(targ_backend)=}") From the point of view of src and targ, you see the same thing: assert list(src) == list(targ) == ['file_1', 'file_2']
assert (
src['file_1']
== targ['file_1']
== [['A', 'B', 'C'], ['one', 'two', 'three']]
) But the backend of targ is different: src_backend['file_1.pkl']
# b'\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00]\x94(]\x94(K\x01K\x02K\x03e]\x94(K\x04K\x05K\x06ee.'
targ_backend['file_1.csv.gz']
# b'\x1f\x8b\x08\x00*YWe\x02\xff3\xd41\xd21\xe6\xe52\xd11\xd51\xe3\xe5\x02\x00)4\x83\x83\x0e\x00\x00\x00' |
Beta Was this translation helpful? Give feedback.
-
A path getter with Pipefrom operator import methodcaller, attrgetter, itemgetter
from dol import Pipe
def simple_path_getter(*keys, getter=itemgetter):
"""
Make a function that gets a path of keys from an object.
>>> path_getter = simple_path_getter('a', 'b', 'c')
>>> path_getter({'a': {'b': {'c': 1}}})
1
"""
return Pipe(*map(getter, keys))
# a function that does obj['a']['b']['c']
path_getter = simple_path_getter('a', 'b', 'c')
d = {'a': {'b': {'c': 1}}}
assert path_getter(d) == 1
# function that does obj.x (attrgetter('x') would do, but just for illustration)
x_attr_getter = simple_path_getter('x', getter=attrgetter)
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
assert x_attr_getter(p) == 1 |
Beta Was this translation helpful? Give feedback.
-
Renaming files to a normalized formtldr;# files: mapping to your files (names and content)
# rename_mapping: your old_name: new_name dict
for old, new in rename_mapping.items():
# gets the contents of the file (and deletes the (old) file) and writes the contents under the new file name
files[new] = files.pop(old) Having AI make
|
Beta Was this translation helpful? Give feedback.
-
A
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Acquire (download, copy...) multiple content itemsSee also medium post: A tiny flexible data acquisition python function and gist. This module provides tools for downloading and managing multiple content items from various sources, such as files or URLs, into a chosen storage system. At its core, the The acquire_content function is configurable with a save_condition parameter to validate content before saving, making it adaptable to a variety of data sources. Error handling is also considered; for example, requests can be used to handle HTTP errors gracefully when working with URLs. A flexible decorator approach is demonstrated to simplify function creation, allowing a developer to create specialized content acquisition functions easily, especially useful for repetitive tasks like downloading data science resources or cheat sheets. import os
import requests
from typing import Dict, Union, MutableMapping, KT, VT, TypeVar, Callable, Any
from functools import partial
DFLT_STORE_DIR = os.environ.get('DFLT_DOL_DOWNLOAD_DIR', '~/Downloads')
URI = VT
Dirpath = str
ContentType = TypeVar('ContentType')
StoreFunc = Callable[[KT, ContentType], None]
def is_not_none(x):
return x is not None
def acquire_content(
uri_to_content: Callable[[URI], ContentType],
uris: Dict[KT, URI] = None,
store: Union[Dirpath, MutableMapping, StoreFunc] = DFLT_STORE_DIR,
*,
save_condition: Callable[[Any], bool] = is_not_none
):
"""
Downloads and stores content from a given set of URIs.
uri_to_content is a callable function that takes a URI and returns content. This is usually set to:
- a function that reads file content, like `open(filepath).read()`
- a function that fetches URL content, like `requests.get(url).content`
However, here, we demonstrate with a simple string operation (e.g., uppercasing strings) as a substitute
to show functionality.
Note that the uri_to_content function will usually be something giving you the
contents of a file or URL.
>>> from pathlib import Path
>>> files_uri_to_content = lambda filepath: Path(filepath).read_text()
>>> urls_uri_to_content = lambda url: requests.get(url).content # doctest: +SKIP
Here, we use a simple `str.upper` to not have to deal with actual IO during tests:
Also, we'll use a dict as a store, for test simplicity purposes.
Usually, though, you'll want to use a directory or a MutableMapping as store,
or a function that stores content in a specific way.
>>> store = {}
>>> uris = {'example1': 'hello', 'example2': 'world'}
>>> acquire_content(str.upper, uris, store) # uri_to_content here is str.upper, to simulate content acquisition.
>>> store
{'example1': 'HELLO', 'example2': 'WORLD'}
Note that often you want to just fix the uri_to_content function and sometimes store.
The acquire_content acts as a function factory for your convenience. If you don't
specify uris (but at least specify `uri_to_content`), you get a function that takes
uris as the first argument, and stores the content therefrom.
>>> content_acquirer = acquire_content(str.upper, store=store) # doctest: +ELLIPSIS
>>> content_acquirer({'example3': 'foo', 'example4': 'bar'})
>>> store
{'example1': 'HELLO', 'example2': 'WORLD', 'example3': 'FOO', 'example4': 'BAR'}
# Examples that would be typical for uri_to_content:
# acquire_content(lambda filepath: open(filepath, 'rb').read(), uris, store) # Reads file content +SKIP
# acquire_content(lambda url: requests.get(url).content, uris, store) # Fetches URL content +SKIP
"""
# if uris is None, we're parametrizing the download_content function
store = ensure_store_func(store)
if uris is None:
assert callable(uri_to_content), "uri_to_content must be a callable if uris is None"
return partial(acquire_content, uri_to_content, store=store)
# Loop through uris and store the processed content
for key, uri in uris.items():
content = uri_to_content(uri)
if save_condition(content):
store(key, content)
def ensure_store_func(store: Union[Dirpath, MutableMapping, Callable]) -> StoreFunc:
"""
Ensures a store function is returned based on the type of 'store' argument provided.
- If store is a callable, it returns store directly.
- If store is a directory path, it creates a Files object (using dol) to manage file storage in that directory.
- If store is a MutableMapping, it returns the __setitem__ method of the store.
- If none of these types match, a ValueError is raised.
Examples:
>>> store = {}
>>> func = ensure_store_func(store)
>>> func('key', 'value') # should store the value in the dictionary
>>> assert store == {'key': 'value'}
>>> store = '~/Downloads'
>>> try:
... func = ensure_store_func(store)
... except ValueError:
... print("Directory does not exist, as expected.") # Simulates an invalid directory check
>>> ensure_store_func(lambda k, v: print(f"Storing {k}: {v}")) # doctest: +ELLIPSIS
<function <lambda> at ...>
# Note: For Files store handling, you'll need a valid directory:
# >>> ensure_store_func("/valid/directory/path") # Requires dol.Files +SKIP
"""
if callable(store):
return store
elif isinstance(store, str):
dirpath = os.path.expanduser(store)
if os.path.isdir(dirpath):
from dol import Files
return Files(dirpath).__setitem__
else:
raise ValueError(f"The directory path {dirpath} does not exist.")
elif isinstance(store, MutableMapping):
# If store is a MutableMapping, we'll use its __setitem__ method
store_obj = store
return store_obj.__setitem__
else:
raise ValueError("uri_to_content must be a callable, or MutableMapping, or a dir path")
# A few useful uri_to_content functions, elegantly defined as (picklable) function compositions
from dol import Pipe
from pathlib import Path
import requests
from operator import methodcaller, attrgetter
path_to_bytes = Pipe(Path, methodcaller('read_bytes'))
path_to_string = Pipe(Path, methodcaller('read_text'))
url_to_bytes = Pipe(requests.get, attrgetter('content')) Examplesacquire_url_bytes = acquire_content(url_to_bytes)
memory_hacks = {
"Memory Improvement in Context.pdf": "https://link.springer.com/content/pdf/10.1007/978-1-4612-2760-1_12.pdf",
"Predicting and Improving Memory Retention.pdf": "https://home.cs.colorado.edu/~mozer/Research/Selected%20Publications/reprints/MozerLindsey2017.pdf",
"A New Look at Memory Retention and Forgetting.pdf": "https://memorylab.nd.edu/assets/512320/2022_radvansky_doolen_pettijohn_ritchey_jep_lmc_.pdf",
"Exercises to Work Memory.pdf": "https://neuronup.us/wp-content/uploads/2021/05/5-free-printable-memory-exercises-downloadable.pdf"
}
acquire_url_bytes(memory_hacks) # will download all five files to the default download directory That's nice, and it works, but if I put some urls that didn't work in there, you would have Also, in the code below, see how we use acquire_content as a decorator. @acquire_content
def more_robust_url_acquisition(url: URI, verbose: int = 2) -> bytes:
verbose = int(verbose)
try:
response = requests.get(url)
response.raise_for_status() # Check for HTTP errors
if verbose >= 2:
print(f"Successfully downloaded and stored contents from {url}")
return response.content
except requests.exceptions.RequestException as e:
if verbose >= 1:
print(f"Failed to download from {url}: {e}")
more_cheat_sheets = {
"Scikit-Learn Cheat Sheet: Python Machine Learning.pdf": "https://www.datacamp.com/cheat-sheet/scikit-learn-cheat-sheet-python-machine-learning",
"Machine Learning Cheat Sheet.pdf": "https://www.datacamp.com/cheat-sheet/machine-learning-cheat-sheet",
"Scikit-Learn Cheat Sheet for Machine Learning.pdf": "https://www.kdnuggets.com/publications/sheets/Scikit-Learn_Cheatsheet_for_Machine_Learning.pdf",
"The Complete Collection of Data Science Cheat Sheets.pdf": "https://www.kdnuggets.com/publications/sheets/The_Complete_Collection_of_Data_Science_Cheatsheets_KDnuggets.pdf",
"Machine Learning Cheat Sheet.pdf": "https://raw.githubusercontent.com/soulmachine/machine-learning-cheat-sheet/master/machine-learning-cheat-sheet.pdf",
"Data Science Cheat Sheets.pdf": "https://github.com/fralfaro/DS-Cheat-Sheets",
"Data Science Cheatsheet.pdf": "https://github.com/aaronwangy/Data-Science-Cheatsheet",
"Cheat Sheet for Python Machine Learning and Data Science.pdf": "https://github.com/ghimiresunil/Cheat-Sheet-for-Python-Machine-Learning-and-Data-Science",
"Machine Learning and Data Science Cheat Sheet.pdf": "https://www.datasciencecentral.com/data-science-cheat-sheet/",
"Machine Learning Interview Cheat Sheets.pdf": "https://tinniaru3005.github.io/ExploreCS/notes/machine%20learning.pdf",
"ML Cheatsheet Documentation.pdf": "https://buildmedia.readthedocs.org/media/pdf/ml-cheatsheet/latest/ml-cheatsheet.pdf",
"Scikit-Learn CheatSheet: Python Machine Learning Tutorial.pdf": "https://elitedatascience.com/wp-content/uploads/2018/05/Python-Machine-Learning-Cheatsheet.pdf"
}
download_to_folder = '/Users/thorwhalen/Dropbox/_odata/ai_contexts/misc/ml_cheat_sheets'
more_robust_url_acquisition(more_cheat_sheets, store=download_to_folder) # will download all five files to the default download directory |
Beta Was this translation helpful? Give feedback.
-
A collection of random
dol
code snippets.Beta Was this translation helpful? Give feedback.
All reactions