Skip to content

Latest commit

 

History

History
391 lines (287 loc) · 19 KB

README.md

File metadata and controls

391 lines (287 loc) · 19 KB

Small

C++ Small Containers

Small


  • Applications usually contain many auxiliary small data structures for each large collection of values. Container implementations often include several optimizations for the case when they are small.
  • These optimizations cannot usually make it to the STL because of ABI compatibility issues. Users might need to reimplement these containers or rely on frameworks that include these implementations.
  • Depending on large library collections for simple containers might impose a cost on the user that's higher than necessary and hinder collaboration on the evolution of these containers.
  • This library includes independent implementations of the main STL containers optimized for the case when they are small.

Build Status Latest Release Documentation Discussions


Facebook QZone Weibo Reddit Twitter LinkedIn WhatsApp Line.me Telegram.me HackerNews


Table of Contents

Quickstart

Integration:

=== "CMake"

=== "Add subdirectory"

    ```bash
    git clone https://github.com/alandefreitas/small/
    ```

    ```cmake
    add_subdirectory(small)
    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "Fetch content"

    ```cmake
    include(FetchContent)
    
    FetchContent_Declare(small
        GIT_REPOSITORY https://github.com/alandefreitas/small
        GIT_TAG origin/master # or whatever tag you want
    )

    FetchContent_GetProperties(small)
    if(NOT small_POPULATED)
        FetchContent_Populate(small)
        add_subdirectory(${small_SOURCE_DIR} ${small_BINARY_DIR} EXCLUDE_FROM_ALL)
    endif()

    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "External package"

    ```cmake
    # Follow installation instructions and then... 
    find_package(small REQUIRED)
    if(NOT small_FOUND)
        # Throw or put your FetchContent script here
    endif()

    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "Install"

!!! note

    Get the binary package from the [release section](https://github.com/alandefreitas/small/releases). 

    These binaries refer to the latest release version of small.

!!! hint
    
    If you need a more recent version of `small`, you can download the binary packages from the CI artifacts or build the library from the source files.

=== "Build from source"

!!! note

    Ensure your C++ compiler and CMake are up-to-date and then:

=== "Ubuntu + GCC"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
    # Build
    sudo cmake --build . --parallel 2 --config Release
    # Install 
    sudo cmake --install .
    # Create packages
    sudo cpack .
    ```

=== "Mac Os + Clang"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
    # Build
    cmake --build . --parallel 2 --config Release
    # Install 
    cmake --install .
    # Create packages
    cpack .
    ```

=== "Windows + MSVC"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="/O2"
    # Build
    cmake --build . --parallel 2 --config Release
    # Install 
    cmake --install .
    # Create packages
    cpack .
    ```

!!! hint "Parallel Build"
    
    Replace `--parallel 2` with `--parallel <number of cores in your machine>`

!!! note "Setting C++ Compiler"

    If your C++ compiler that supports C++17 is not your default compiler, make sure you provide CMake with the compiler location with the DCMAKE_C_COMPILER and DCMAKE_CXX_COMPILER options. For instance:

    ```bash
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2" -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8
    ```

=== "File amalgamation"

!!! note

    Because containers are header-only, you can directly copy the contents from the `source` directory into your project.

!!! hint

    In that case, you are responsible for setting the appropriate target include directories and any compile definitions you might require.

Once the library is properly integrated, you can create containers from the namespace small like any other STL container:

--8<-- "examples/main.cpp"

All containers are optimized for the case when they're small but also efficient when they are large. The containers mix the common techniques found in other small container libraries:

  • Inline allocation for small containers
  • Custom expected sizes
  • Identification of relocatable types
  • Custom growth factors with better defaults
  • Communication with system memory allocators
  • Explicit consideration of CPU cache sizes and branch prediction

Most applications have many small lists and sets of elements. These containers avoid spending a lot of time with large containers that contain just a few elements. Small containers usually try to use the stack before dynamically allocating memory and try to represent associative containers with stack arrays, unless these sets are very large.

The following containers are available:

  • small::vector
  • small::max_size_vector
  • small::string
  • small::set
  • small::max_size_set
  • small::multiset
  • small::max_size_multiset
  • small::unordered_set
  • small::max_size_unordered_set
  • small::unordered_multiset
  • small::max_size_unordered_multiset
  • small::map
  • small::max_size_map
  • small::multimap
  • small::max_size_multimap
  • small::unordered_map
  • small::max_size_unordered_map
  • small::unordered_multimap
  • small::max_size_unordered_multimap
  • small::stack
  • small::queue
  • small::priority_queue

Although many compilers support small string optimization (SSO) already, this library will ensure all strings support SOO, custom inline sizes, relocation, and unicode.

Vectors

This small vector implementation includes:

  • Inline allocation for small vectors
  • Custom expected size
  • Special treatment of relocatable types
    • Relocatable types can be moved with memcpy, bypassing destructors and constructors.
    • Relocatable types are defined by default for POD types and aggregate types of PODs
    • The small::is_relocatable traits can be used as an extension point for custom types
  • Better growth factors
  • Consider the cache line size in allocations
  • Heap allocations can be disabled with small::max_size_vector

When there are fewer elements than a given threshold, the elements are kept in a stack buffer for small vectors. Otherwise, the vector works as usual. However, if you are 100% sure you will never need more than N elements, you can use a max_size_vector, where elements are always inline.

The default number of elements in a small vector is usually the number of elements we can already fit inline in a vector. For larger data types, the default_inline_storage trait can be used as an extension point where one can define how many elements a small vector of that type should contain by default.

--8<-- "examples/default_inline_storage.cpp"

When there's a reasonable default for the number of inline elements, this strategy avoids multiple vector type instantiations for different inline storage sizes.

This small vector implementation used folly, abseil, and LLVM as a reference.

Strings

The small string includes all the common optimizations for small vectors, and a custom template parameter to set how large we expect the string to be (in bytes).

However, when strings are representing text, if there's one thing that makes them not small is not supporting UTF8. In addition to the common interface for strings, small::string includes extra functions to identify and work with UTF8 code points with random access.

--8<-- "examples/unicode_strings.cpp"

The problem of supporting UTF8 is easier to explain than it is to solve. Programming languages tend to solve this problem by (1) forbidding byte or substring access, and/or (2) allowing only access to code points with O(n) cost, where n is the number of code points. Because anything that forbids byte access would be incompatible with a C++ string, we allow direct byte access, and strings are allowed to be malformed unicode, which we can check with small::is_malformed.

All capacity and access functions contain extra overloads that accept codepoint indexes, defined as a strong type, rather than byte indexes. By using these functions, one can ensure the string is never malformed. It's up to the user to decide whether these access functions are useful and worth it in a particular application.

Access to codepoints is provided with an inline lookup-table trick that allows us to access codepoints in O(log m) time, where m is the number of multibyte code points in the strings. When there are no multibyte codepoints in the string, the string works as usual and no extra memory is required for the table.

Sets and Maps

The small set/map classes use a more cache-friendly flat set/map and all other optimizations mentioned above for internal algorithms. As with other small containers, a custom template parameter can be used to define the number of inline elements in the container.

The small::default_inline_storage and small::is_relocatable trait can also be defined for custom types, and all the usual set/map, ordered/unordered, uni/multi variants are also provided:

--8<-- "examples/associative.cpp"

Unlike a small::vector or small::string, the asymptotic time complexities of flat sets/maps are very different from their std:: counterparts and should only be used when they are small. Because they are internally implemented as arrays, manipulating these containers costs O(n).

For large containers, you can use std containers with custom allocators. Or for efficient large containers, you can use the abseil containers, implemented as B+-trees.

Contributing

Get Involved

  • Discussions are concentrated on our GitHub discussions page. Don't refrain from asking questions and proposing ideas. If this library helps you create something interesting, please divulge it with the community.
  • If you are a programmer with good ideas, please share these ideas with us.
  • Academic collaboration is more than welcome. It'd be great to see this library help people write papers.

Ideas and Roadmap

Feel free to contribute new features to this library. For complex features and changes, consider getting feedback from the community first. Contributing to an existing code base with its conventions might seem obscure at first but please don't let that discourage you from sharing your ideas.

There are many ways in which you can contribute to this library:

  • Testing the library in new environments see 1, 2, 3
  • Contributing with interesting examples see 1
  • Finding problems in this documentation see 1
  • Finding bugs in general see 1, 2, 3, 4
  • Whatever idea seems interesting to you

The only thing we ask you is to make sure your contribution is not destructive. Some contributions in which we are not interested are:

  • "I don't like this optional feature, so I removed/deprecated it"
  • "I removed this feature to support older versions of C++" but have not provided an equivalent alternative
  • "I removed this feature, so I don't have to install/update ______" but have not provided an equivalent alternative
  • "I'm creating this high-cost promise that we'll support ________ forever" but I'm not sticking around to keep that promise

In doubt, please open a discussion first

Guidelines

If contributing with code, please leave all warnings ON (-DSMALL_BUILD_WITH_PEDANTIC_WARNINGS=ON), use cppcheck, and clang-format.

If contributing to the documentation, please edit README.md directly, as the files in ./docs are automatically generated with mdsplit.

Contributors

alandefreitas
Alan De Freitas
tearfur
Yat Ho
tomasz-szyszko
Tomasz-szyszko
marcosfpr
Marcos Pontes

References

These are some references we used for this work: