Skip to content

Commit

Permalink
Deprecate self.store in favor of self.storage
Browse files Browse the repository at this point in the history
  • Loading branch information
frthjf committed Oct 9, 2020
1 parent 59d0a18 commit fe54cad
Show file tree
Hide file tree
Showing 10 changed files with 149 additions and 206 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

<!-- Please add changes under the Unreleased section that reads 'No current changes' otherwise -->

# Unreleased
# v2.6.0

- Support static host info methods in Registration
- Deprecates self.store in favor of self.storage
- Support static host info methods in registration
- New `find_experiments` method to simplify recursive search for experiments in a given directory

## v2.5.2
Expand Down
31 changes: 13 additions & 18 deletions docs/guide/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,13 @@ For convenience, the dict interface can be accessed using the `.` object notatio

Flags are configuration values that are associated with the particular execution, for example the random seeds or worker IDs. They are accessible via the `self.flags` object, that supports the `.` object notation. You can add your own flags through basic assignment, e.g. ``self.flags.counter = 1``. To avoid name collision, all native machinable flags use UPPERCASE (e.g. ``self.flags.SEED``).

## self.store
## self.storage

The interface `self.store` allows for the storing of data and results of the components. Note that you don't have to specify where the data is being stored. machinable will manage unique directories automatically. The data can later be retrieved using the [Storage](./storage.md) interface.
`self.storage` provides access to the storage directory of the component (each component directy name is unique and managed automatically so you don't have to specify where the data is being stored). The data can later be retrieved using the [storage interfaces](./storage.md).

**Log**

`self.store.log` or `self.log` provides a standard logger interface that outputs to the console and a log file.
`self.storage.log` or `self.log` provides a standard logger interface that outputs to the console and a log file.

``` python
self.log.info('Component created')
Expand All @@ -113,7 +113,7 @@ self.log.debug('Component initialized')

**Records**

`self.store.record` or `self.record` provides an interface for tabular logging, that is, storing recurring data points at each iteration. The results become available as a table where each row represents each iteration.
`self.storage.record` or `self.record` provides an interface for tabular logging, that is, storing recurring data points at each iteration. The results become available as a table where each row represents each iteration.

``` python
for iteration in range(10):
Expand All @@ -129,27 +129,22 @@ for iteration in range(10):

If you use the `on_execute_iteration` event, iteration information and `record.save()` will be triggered automatically at the end of each iteration.

Sometimes it is useful to have multiple tabular loggers, for example to record training and validation performance separately. You can create custom record loggers using `self.store.get_record_writer(scope)` which returns a new instance of a record writer that you can use just like the main record writer.
Sometimes it is useful to have multiple tabular loggers, for example to record training and validation performance separately. You can create custom record loggers using `self.storage.get_record_writer(scope)` which returns a new instance of a record writer that you can use just like the main record writer.

**Store**
**Custom data**

You can use `self.store.write()` to write any other Python object, for example:
Any other data can be stored in the `data/` subdirectory.

```python
self.store.write('final_accuracy', [0.85, 0.92])
```
Note that to protect unintended data loss, overwriting will fail unless the ``overwrite`` argument is explicitly set.

For larger data structures, it can be more suitable to write data in specific file formats by appending a file extension, i.e.:
You can use `self.storage.write_data()` to write any other Python object, for example:

``` python
self.store.write('data.txt', 'a string')
self.store.write('data.p', generic_object)
self.store.write('data.json', jsonable_object)
self.store.write('data.npy', numpy_array)
self.storage.save_data('data.txt', 'a string')
self.storage.save_data('data.p', generic_object)
self.storage.save_data('data.json', jsonable_object)
self.storage.save_data('data.npy', numpy_array)
```

Refer to the store [reference](./components.md#store) for more details.
To protect against unintended data loss, you can set `overwrite=False`.

## Config methods

Expand Down
115 changes: 46 additions & 69 deletions src/machinable/core/component.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ def unserialize(cls, serialized):

@staticmethod
def save(component):
if component.node is not None or component.store is None:
if component.node is not None or component.storage is None:
return False
component.store.write(
component.storage.write(
"state.json",
{
"component": component.component_state.serialize(),
Expand Down Expand Up @@ -185,7 +185,7 @@ def __init__(self, config: dict = None, flags: dict = None, node=None):

self._node: Optional[Component] = node
self._components: Optional[List[Component]] = None
self._store: Optional[Store] = None
self._storage = None
self._events = Events()
self._actor_config = None
self._storage_config = None
Expand Down Expand Up @@ -261,25 +261,30 @@ def components(self, value):
self._components = value

@property
def store(self) -> Store:
if self._store is None and isinstance(self.node, Component):
def store(self):
# deprecated alias
return self.storage

@property
def storage(self):
if self._storage is None and isinstance(self.node, Component):
# forward to node store if available
return self.node.store
return self._store
return self.node.storage
return self._storage

@store.setter
def store(self, value):
self._store = value
@storage.setter
def storage(self, value):
self._storage = value

@property
def record(self) -> Record:
"""Record writer instance"""
return self.store.record
return self.storage.record

@property
def log(self) -> Log:
"""Log writer instance"""
return self.store.log
return self.storage.log

@property
def events(self) -> Events:
Expand Down Expand Up @@ -308,26 +313,25 @@ def dispatch(
self.on_init_storage(storage_config)

self._storage_config = storage_config
self.store = Store(component=self, config=storage_config)
self.storage = Store(component=self, config=storage_config)

if not storage_config["url"].startswith("mem://"):
OutputRedirection.apply(
self._storage_config["output_redirection"],
self.store.get_stream,
self.storage.get_stream,
"output.log",
)

if not self.store.exists("host.json", _meta=True):
self.store.write("host.json", get_host_info(), _meta=True)
if not self.store.exists("component.json", _meta=True):
self.store.write("component.json", self.serialize(), _meta=True)
if not self.store.exists("components.json", _meta=True):
self.store.write(
if not self.storage.has_file("host.json"):
self.storage.save_file("host.json", get_host_info())
if not self.storage.has_file("component.json"):
self.storage.save_file("component.json", self.serialize())
if not self.storage.has_file("components.json"):
self.storage.save_file(
"components.json",
[component.serialize() for component in self.components]
if self.components
else [],
_meta=True,
)
self.component_state.save(self)

Expand Down Expand Up @@ -442,12 +446,12 @@ def execute(self):
if self.on_after_execute_iteration(iteration) is not False:
# trigger records.save() automatically
if (
self.store
and self.store.has_records()
and not self.store.record.empty()
self.storage
and self.storage.has_records()
and not self.storage.record.empty()
):
self.store.record["_iteration"] = iteration
self.store.record.save()
self.record["_iteration"] = iteration
self.record.save()
except (KeyboardInterrupt, StopIteration):
callback = StopIteration

Expand Down Expand Up @@ -500,9 +504,7 @@ def refresh_status(self, log_errors=False):
"""
try:
self.component_status["heartbeat_at"] = str(pendulum.now())
self.store.write(
"status.json", self.component_status, overwrite=True, _meta=True
)
self.storage.save_file("status.json", self.component_status)
except (IOError, Exception) as ex:
if log_errors:
self.log.error(
Expand All @@ -513,29 +515,6 @@ def refresh_status(self, log_errors=False):

return True

def get_url(self, append=""):
"""Returns the storage URL of the component"""
return os.path.join(
self._storage_config["url"],
os.path.join(
self._storage_config.get("directory", ""),
self._storage_config["experiment"],
self._storage_config.get("component", ""),
append,
),
)

def local_directory(self, append=""):
"""Returns the local storage filesystem path, or False if non-local
# Returns
Local filesystem path, or False if non-local
"""
if not self._storage_config["url"].startswith("osfs://"):
return False

return os.path.join(self.get_url().split("osfs://")[-1], append)

def set_seed(self, seed=None) -> bool:
"""Applies a global random seed
Expand Down Expand Up @@ -564,16 +543,16 @@ def save_checkpoint(self, path: str = None, timestep=None) -> Union[bool, str]:
timestep: int = len(self.component_state.checkpoints)

if path is None:
if not self.store:
if not self.storage:
raise ValueError("You need to specify a checkpoint path")

fs_prefix, basepath = self.store.config["url"].split("://")
fs_prefix, basepath = self.storage.config["url"].split("://")
if fs_prefix != "osfs":
# todo: support non-local filesystems via automatic sync
raise NotImplementedError(
"Checkpointing to non-os file systems is currently not supported."
)
checkpoint_path = self.store.get_path("checkpoints", create=True)
checkpoint_path = self.storage.get_path("checkpoints", create=True)
path = os.path.join(os.path.expanduser(basepath), checkpoint_path)

checkpoint = self.on_save(path, timestep)
Expand All @@ -591,8 +570,8 @@ def restore_checkpoint(self, checkpoint):
# Arguments
filepath: Checkpoint filepath
"""
if self.store is not None:
self.store.log.info(f"Restoring checkpoint `{checkpoint}`")
if self.storage is not None:
self.log.info(f"Restoring checkpoint `{checkpoint}`")
return self.on_restore(checkpoint)

def serialize(self):
Expand Down Expand Up @@ -811,26 +790,24 @@ def dispatch(self, components_config, storage_config, actor_config=None):
payload["components"] = [
config_map(component) for component in components
]
elif key == "store" or key == "_store":
if key == "_store":
payload["store"] = storage_config
elif key == "storage" or key == "_storage":
if key == "_storage":
payload["storage"] = storage_config
else:
store = Store(component=self, config=storage_config)
store.write("host.json", get_host_info(), _meta=True)
store.write(
"component",
storage = Store(component=self, config=storage_config)
storage.save_file("host.json", get_host_info())
storage.save_file(
"component.json",
{"config": self.node["config"], "flags": self.node["flags"]},
_meta=True,
)
store.write(
"components",
storage.save_file(
"components.json",
[
{"config": c["config"], "flags": c["flags"]}
for c in components
],
_meta=True,
)
payload["store"] = store
payload["storage"] = storage
else:
raise ValueError(
f"Unrecognized argument: '{key}'. "
Expand Down
Loading

0 comments on commit fe54cad

Please sign in to comment.