Skip to content

Ideas for examples, recipes, and utils

Thor Whalen edited this page Oct 23, 2019 · 1 revision

Write caches

Table rows

The data items are "table rows" (could be tuple, dict, pandas.Series, namedtuple...) and you are saving them in some table format (e.g. numpy 2d array, pandas dataframe, csv, xlsx). In all these cases, you don't want to be opening and closing a file every time you're writing a row. So you cumul, aggreg, and write (flush). (Maybe I should call it CumulAggregFlush...)

Bulk inserts

Most DBs have a "cached writes" functionality. In mongo and sql this is called "bulk insert". This is especially useful when the DB is hosted remotely: You don't want to be incurring the overhead of communication for every small write. Example: Mongo: https://docs.mongodb.com/manual/reference/method/Bulk.insert/, SQL: https://codingsight.com/sql-server-bulk-insert-part-1/ In fact, these are cases that will probably find their way to actual utils in the py2store core.

Aggregations

Abstract use case: You have a commutative and associative operation that you use to make hierarchal aggregations. Concrete use case: You aggregate seconds-granularity numerical time series in minute, hour, day, and week levels, recording for each bucket, the count, sum and sum of squares (for variance) of each bucket. Easier to assume that data will come ordered, but could also handle the case where it is "mostly ordered" (that is, where we allow some limited number of small-range derangements). See https://github.com/thorwhalen/ut/blob/master/dpat/buffer.py