Skip to content

Latest commit

 

History

History
539 lines (387 loc) · 22.4 KB

overview.adoc

File metadata and controls

539 lines (387 loc) · 22.4 KB

Documentation

Note
the documentation is not complete yet. I am still writing it…​

Overview

This library offers a way to represent optional (or nullable) objects (objects that may not have a value) of type T by encoding the no-value state in one of the states of T. This is an alternative to boost::optional that offers a limited interface and a guarantee that the size of optional T is the same as the size of T.

In order to create an optional object type for your type T, you first need to decide how you want to represent the the no-value state in T. Suppose you want to use type int and want -1 to represent the no-value. Now you have to define an marked value policy: a type that formally reflects how the no-value is represented. Luckily, for this common case the library comes with a predefined class template that you can use: mark_int<int, -1>. With this policy, you can define the type of the optional object, and use it:

markable<mark_int<int, -1>> oi; // optional int
static_assert(sizeof(oi) == sizeof(int), "no size penalty");

Instances of markable have a very small interface. They are copyable and movable. The default constructor initializes an object with the value indicated by the marked-value policy. Another explicit constructor takes the value of T:

using opt_int = markable<mark_int<int, -1>>;
opt_int oi;      // internal value initialized to -1 (as per policy)
opt_int o1 {0};  // internal value initialized to 0
opt_int oN {-1}; // internal value initialized to -1

There are also three observer functions:

  • has_value() that checks if the currently stored value is that indicated in the marked-value policy (MP);

  • value() that extracts the stored value, with the precondition that the value is different than the one indicated in MP;

  • representation_value() that extracts the value as stored internally, with no precondition.

// continuing the previous example
assert (!oi.has_value());
assert ( o0.has_value());
assert (!oN.has_value());

assert (o0.value() == 0);
// oi.value() is UB
// oN.value() is UB

assert (oi.representation_value() == -1);
assert (o0.representation_value() ==  0);
assert (oN.representation_value() == -1);

As you can see, there are two ways to set the special marked value: either by default construction, or by providing it explicitly.

It is not possible to change the stored value through function value(): it returns a non-mutable reference or value (based on the policy).

In order to 'reset' the optional object you can either move-assign a new value, or use assign member functions:

// continuing the previous example
o0 = {};                      // reset to marked value
oN = opt_int{2};              // new value

o0.assign(2);                 // assign value 2
oN.assign_representaiton(2);  // also results in value 2
oN.assign_representation(-1); // results in no-value state

Each instance of markable also provides three nested types:

  • value_type - value we want to represent,

  • reference_type - what function value returns: in most cases it is const value_type&,

  • representation_type - type of the value that is used for representing values and the no-value: in most of the cases it is same as value_type.

Representation value

The type of the object that markable<MP> uses for representing the value or the information about the lack of value is MP::representaiton_type. Sometimes this type is same as MP::value_type but it is not the rule. For instance, when representing optional enum class E with underlying type int, MP::value_type is E whereas MP::representation_type is int:

enum class E { V0, V1 };

using OptE = markable<mark_enum<E, -1>>;
static_assert(std::is_same_v<OptE::value_type, E>);
static_assert(std::is_same_v<OptE::representation_type, int>);

Whatever value, or the no-value, markable<MP> stores, it is always representable as a value of type MP::representation_type:

OptE o;
assert(o.representation_value() == -1);

o.assign_representation(1);
assert(o.representation_value() == 1);

A representation value can always be reinterpreted as a (potentially missing) value.

OptE o;
assert(!o.has_value());

o.assign_storage(1);
assert(o.has_value());
assert(o.value() == E::V1);

Thus, markable<MP> provides two interfaces: one for modifying and observing the (possibly missing) MP::value_type, and the other for modifying and observing the (always present) MP::representation_type.

Marked-value policies

What type is being stored and how the marked value is encoded is controlled by marked-value policy. You can either define your own, or use one of the policies provided with this library:

mark_int

template <typename Integral, Integral EV> struct mark_int;

Can be used with all types that can be used as template non-type parameters, including built-in integral arithmetic types and pointers.

Integral represents the stored type.

EV is the marked value representation.

mark_fp_nan

template <typename FPT> struct mark_fp_nan;

A policy for floating-point types, where the marked value is encoded as quiet NaN.

FPT needs to be a floating-point scalar type.

mark_value_init

template <typename T> struct mark_value_init;

A policy for storing any Regular type, the marked value is represented by a value-initialized T.

T must meet the requirements of Regular: be default constructible, copyable, moveable, and EqualityComparable.

mark_stl_empty

template <typename Cont> struct mark_stl_empty;

Marked value is created using value initialization. Marked value is checked by calling member function empty(). Useful for STL containers where we want the empty container to represent the no-value.

T must be default constructible and have a member function empty().

mark_bool

struct mark_bool;

This is the policy for storing an optional bool in a compact way, such that the size of markable<mark_bool> is 1.

mark_enum

template <typename Enum, std::underlying_type_t<Enum> Val> struct mark_enum;

A policy for storing any enum, the marked value is represented by the indicated integral value Val, which can be outside the range of valid enum values.

Enum must be an enumeration type.

mark_optional

template <typename Optional> struct mark_optional;

This policy is used for types that cannot (or do not want to) spare any value to indicate the marked state. We logically represent optional T, but store it in boost::optional<T> or std::experimental::optional<T>.

Optional must be an instance of either boost::optional or std::experimental::optional.

Defining a custom marked-value policy

In order to provide a custom marked-value policy to store a given type T, we need to provide a class that derive it from markable_type<T> and implements two static member functions: marked_value and is_marked_value:

struct mark_string_with_0s : markable_type<std::string>
{
  static representation_type marked_value() {
    return std::string("\0\0", 2);
  }
  static bool is_marked_value(const representation_type& v) {
    return v.compare(0, v.npos, "\0\0", 2) == 0;
  }
};

Base class markable_type<T> defines all the necessary nested types and some house-keeping functions. With it, we are declaring what type we will be storing.

Function marked_value returns a value of the "representation" type (representation_type) that represents the marked value.

Function is_marked_value returns true iff the the given value is recognized as the marked value.

In a less likely case where we want to store the represent an optional value of type T, but store it internally in a different type, we need to provide more arguments to markable_type<T>. Suppose we want to implement the policy for storing type bool in a storage of size 1 (the same way that mark_bool does). We need three states: no-value, true, and false. We cannot store it in type bool because it only has two states. So, for storage we will use type char. We will use value 2 ('\2') to represent the marked state, value 0 to represent value false and 1 to represent true. Now, apart from defining how the marked state is encodes, we also need to provide a recipe on how to encode a bool in a char, and how to extract the bool value from char storage. We need to define additional two static member functions: access_value and store_value:

struct compact_bool_policy : markable_type<bool, char, bool> // see below
{
  static representation_type marked_value() { return char(2); }
  static bool is_marked_value(representation_type v) { return v == 2; }

  static reference_type access_value(const storage_type& v) { return bool(v); }
  static storage_type store_value(const value_type& v) { return v; }
};

The three types passed to markable_type denote respectively:

  1. value_type — the type we are modeling.

  2. representation_type — the type we use to internally represent all values plus the no-value state.

  3. reference_type — what type function value() should return.

Class template markable_type also provides the fourth type: storage_type, which in this case defaults to representation_type. Because function value() should return a bool and we are storing no bool we have to create a temporary value, and return it by value: therefore type reference_type is not really a reference.

Default marked-value policies

There is also a possibility to select a default mark-value policy for a given type.

default_markable<int> iN, i0(0);_

Alias default_markable selects the marked policy and uses it in markable template. This mechanism tries to pick the best policy:

  • mark_int<T, std::numeric_limits<T>::max()> for built-in integral types,

  • mark_fp_nan<T> for floating-point types,

  • mark_bool for bool,

  • mark_enum<T, std::numeric_limits<typename std::underlying_type<T>::type>::max() for enumerations,

  • mark_value_init<T> otherwise.

It is possible that the default policy selection will create an ill-formed type. Also, note that the policy selected by default may not work for your use case. Manual policy selection might be preferred, because you are in control of what is actually selected.

default_markable also selects order_by_value as the ordering policy. This is slightly slower, but safer than and sometimes more intuitive than order_by_representation.

Using dual storage for marking values

Sometimes there is no spare value of T, but there is a spare combination of member values in DUAL<T>, where DUAL<T> a type layout-compatible with T but without invariants. Consider the following type representing a range of integers:

class range
{
  int min_, max_;
  bool invariant() const { return min_ <= max_; }

public:
  range(int min, int max) : min_(min), max_(max) { assert (invariant()); }
  int min() const { assert (invariant()); return min_; }
  int max() const { assert (invariant()); return max_; }
  ~range() { assert (invariant()); }
};

It is guaranteed and enforced with assertions that min_ is never greater than max_. This leaves many spare combinations of values, e.g., {0, -1}. But if we try to use them, we violate the invariant, and trigger assertion failure. To address such cases, markable provides the dual storage. This will only work if your type is standard-layout. You need to define a type layout-compatible with range:

struct range_representation
{
  int min_, max_;
};

Now you can request a "dual storage". It is a union that holds either a real type or its weaker counterpart:

union
{
  range                value_;
  range_representation representation_;
};

Now, either we are storing a value (first member is active), or we are storing the row type (second member is active), in which we can encode the value impossible in value type. We do not know which member of the union is active at a given moment, but it is always safe to inspect the value of member representation_. This is guaranteed by the common initial sequence feature of unions in C++. When we observe the special value, we know that the second member is active. Otherwise we know that the active member is value_.

In order to define the mark policy, you have to inherit from markable_dual_storage_type and define the special value:

struct mark_range : markable_dual_storage_type<mark_range, range, range_representation>
{
  static representation_type marked_value() noexcept { return {0, -1}; }
  static bool is_marked_value(const representation_type& v) noexcept { return v.min_ > v.max_; }
};

The first argument is the type of the policy we are defining. (We are using the CRTP.) The second is the logical value type, and the third is its "raw" layout-compatible conterpart.

Warning
However, there is a certain danger involved when using dual storage. Types T and DualT passed to markable_dual_storage_type need to preserve certain relation: they have to be layout-compatible. While this can be enforced statically in C20 with type traits `std::is_layout_compatible`, there is no way to enforce it in earlier revisions of C, so you have to make sure this is the case. Otherwise you are risking UB.

To guarantee that two Standard Layout types are layout-compatible is difficult, especially if at some point you have to add a member datat to T. It requires that the members of two types decompose to similar basic types with the same layout. In order to minimize the risk of the types becoming non layout compatible, we recommend te following technique. Define the struct only for storing members. Than have both T and its representation counterpart inferit from te member-storing struct. The inheriting types should not define any members:

struct range_members
{
  int min_, max_;
};

class range : private range_members
{
  bool invariant() const { return min_ <= max_; }

public:
  range(int min, int max) : range_members{min, max} { assert (invariant()); }
  int min() const { assert (invariant()); return min_; }
  int max() const { assert (invariant()); return max_; }
  ~range() { assert (invariant()); }
};

struct range_representation : range_members
{
  range_representation() noexcept : range_members{0, -1} {};
};

This way there is only one place that defines members for both types.

Using markable_dual_storage_type requires that the dual storage has a noexcept move constructor and that the marked-value policy function marked_value() is also noexcept. This requirement is necessary to guarantee exceptions safety for markable object. If this requirement is not met the attempt to instantiate a markable object will result in a compile-time error.

If either of the functions is not noexcept in your case but you still want to use the policy and take the risk that neither of the two functions will ever throw, you can use class template markable_dual_storage_type_unsafe instead. This type does not enforce noexcept statically, but will call std::terminate should any of the two functions throw when performing operations on markable in critical places.

Relational operators

There are no relational operations provided by default, as it is not obvious how the no-value should compare against other values. If you need to compare markable values, you have two options:

  • provide a custom comparator, where you explicitly state how the marked state is treated, or

  • select ordering policy in the second template parameter to markable.

This library offers three ordering policies, for rel-ops and std::hash:

  • order_none (the default), no relational operators or std::hash.

  • order_by_value, where marked-value is treated as the smallest possible value, distinct from any other value (much like in boost::optional).

  • order_by_representation, where marked values are ordered by their representation.

using markable_int = markable<
                       mark_int<int, INT_MAX>,  // marked-value policy
                       order_by_value           // ordering policy
                     >;

markable_int mX{}, m0{0};
assert(m0 == m0);
assert(mX < m0);
assert(std::hash<markable_int>{}(m0) == std::hash<int>(0));

Ordering includes not only the six relational operators, but also the specializations of std::hash.

Ordering policy order_by_value is a good safe default. It is slower than order_by_representation because it has to perform a branch, but it works well with marked-value policies that mark more than one representation state:

struct mark_negative : markable_type<int>
{
  static bool is_marked_value(const int& v) { return v < 0; } // any negative value is marked
  static int marked_value() { return -1; }                    // but we can set only one: -1
};

using opt_int_A = markable<mark_negative, order_by_value>;
assert (opt_int_A{} == opt_int_A{-2});                        // ok

using opt_int_B = markable<mark_negative, order_by_representation>;
assert (opt_int_B{} == opt_int_B{-1});                        // ok
assert (opt_int_B{} == opt_int_B{-2});                        // UB

If you are using marked-value policy that marks more than one state, and use order_by_representation ordering policy, you have to make sure that the when comparing two such markables, if their state is marked, its representation must be exactly MP::marked_value(). Otherwise, you will get UB (Undefined Behavior).

It is also possible to define a custom ordering policy:

struct order_custom
{
  template <typename Markable>
  static bool equal(const Markable& l, const Markable& r) { // enables operators == and !=
    // your implementation goes here
  }

  template <typename Markable>
  static bool less(const Markable& l, const Markable& r) { // enables operators <, >, <=, >=
    // your implementation goes here
  }
};

using opt_int = markable<mark_int<int, -1>, order_custom>;

This library also comes with a set of predefined comparator objects, that can be used in STL containers and algorithms without having to make your markable type ordered:

using opt_int = markable<mark_int<int, -1>>; // not comparable

std::set<opt_int, less_by_value>                    osv;
std::set<opt_int, less_by_representation>           osr;
std::unordered_set<opt_int, hash_by_value>          usv;
std::unordered_set<opt_int, hash_by_representation> usr;

Opaque-typedefed markables

Because markable uses policies, you can get the opaque typedef for your markable types for free.

Suppose you have two conceptually different types Count and Num, but because they are markable types using the same policy, they end um being one and the same type:

using Count = markable<mark_int<int, -1>>;
using Num   = markable<mark_int<int, -1>>;

static_assert(std::is_same<Count, Num>::value, "same type");

In order to make the two markable types distinct, you can alter the type of the policies (but not their semantics) by inheriting from them:

struct mark_count : mark_int<int, -1> {};
struct mark_num   : mark_int<int, -1> {};

using Count = markable<mark_count>;
using Num   = markable<mark_num>;

static_assert(!std::is_same<Count, Num>::value, "different types!");

Comparison with Boost.Optional

This library is not a replacement for boost::optional. While there is some overlap, both libraries target different use cases.

Genericity

boost::optional is really generic: It will work practically with any T, to the extent that you can use optional<optional<T>>. You just give the type T and you get the optional object wrapper. In contrast, in markable from the outset you have to make a choice case-by-case how you want to store the 'marked' value T. The policy for managing the marked state is part of the contract, part of semantics, part of the type. Having a nested markable is technically possible, but requires sparing two values of T.

Some type T may not have a 'spare' value to indicate the marked state. In such case, markable cannot help you. In contrast, boost::optional<T> will work just fine: the additional marked state is stored separately. In a way, boost::optional<T> can be thought as boost::variant<boost::none_t, T>.

Life-time management

boost::optional<T> gives you a useful guarantee that if you initialize it to a no-value state, no object of type T is created. This is useful for run-time performance reasons and allows a two phase initialization. In contrast, markable upon construction, immediately constructs a T. markable simply contains T as subobject. In contrast, boost::optional only provides a storage for T and creates and destroys the contained object as needed.

No container-like semantics

boost::optional<T> is almost a container, capable of storing 0 or 1 elements. When it stores one element, you can alter its value. It is impossible in markable: its value and the "container’s size" are one thing. But in exchange, the latter, because it only provides a non-mutable access to the contained value, it can build the return value on the fly, similarly to the proxy reference in std::vector<bool>, but because we only provide a non-mutable reference, it is much simpler and safer. This is why we can provide a markable for type bool (which has no spare value) with the sizeof of a single char.

Expressiveness vs non-ambiguity

markable has a minimalistic interface: this is also to avoid any surprises. As a cost it is less expressive and convenient. There are no implicit conversions, no overloaded operators; unlike boost::optional, it does not have operator== by default: you have to specify a comparison policy and decide yourself how you want to compare the no-value states.

Intended use

In general, it is expected that in the first pass of the implementation of your program, you will use boost::optional<T> as a simple ready-to-use solution. Later, if you determine that boost::optional kills your performance, you can think of replacing it with markable and how you want to represent the no-value state.

Another use case is when you are currently using objects of scalar types with encoded special values (like type std::string::size_type with value std::string::npos) and you want to change it into something safer but be sure you are adding no runtime overhead. You can change your type to markable<mark_int<string::size_type, string::npos>>.