Skip to content

Keyman Code Style Guide

Marc Durdin edited this page Oct 6, 2024 · 27 revisions

This guide introduces a consistent style for documenting Keyman source code using Doxygen, and code format preferences. For now, existing code may not follow these guidelines, but new code should.

When making pull requests, it may be necessary to edit existing code that doesn't match the styling. If you are making a minor change, then don't rewrite the entire function to match the new style. However, any significant new blocks of code should follow this guide. It may be appropriate to open two PRs: one to regularise the format of the code, and a follow-up that actually makes the code change.

We'll be using this style guide for:

  • C/C++
  • Swift
  • Java
  • Python
  • Delphi (Object Pascal)
  • Typescript
  • PHP
  • Bash scripts
  • And anywhere else that makes sense.

At this time, we will not be extracting code comments using doxygen to generate documentation, just following this style guide.

The Guide to the Style Guide

So, we don't want to get into a huge deep style guide fight. Format your code so it is easy to read is the priority.

The few items listed are hopefully sufficient. We're not religious on line lengths, nor even on brace style (the shorter form is more common these days, so we've leaned towards that). Tabs do cause trouble across platforms and systems though.

Style guides can end up becoming a stupid PR comment fight and we are not interested in that. If you come across something that is hard to format, aim for readability ahead of consistency -- or find an alternative way to write it if you are fighting the language for readability.

Presentation on Developer Documentation (devdocs)

Content

File header

Each source file should contain the following header lines:

/*
 * Keyman is copyright (C) SIL Global. MIT License.
 * 
 * Created by <author> on yyyy-mm-dd
 * 
 * (Optional description of this file)
 */

Note that this comment is not a doxygen comment; use /*, not /**. The 'Created by' line is optional.

Code Format

(Formatting options for some languages like C/C++ are defined in .clang-format and are picked up by the vscode formatter)

In brief:

  • Spaces, not tabs

  • 2-space indents

  • braces on same line:

      if(foo) {
        bar();
      } else {
        baz();
      }
  • Wrap long lines (> 130 characters)

  • Always use braces and always on a new line:

    NO:

     if(foo) bar();
     if(itWorked) { baz(); return 0; }

    YES:

      if(foo) {
        bar();
      }
      if(itWorked) {
        baz();
        return 0;
      }
  • Function parameter lists:

    Short function signatures:

    YES:

      void short_func(int param) {

    Long signatures:

    NO:

      km_core_state_debug_item const *km_core_state_debug_items(km_core_state const *state, size_t *num_items)
      {

    NO:

      km_core_state_debug_item const *km_core_state_debug_items(km_core_state const *state, 
                                                              size_t *num_items)
      {

    YES:

      km_core_state_debug_item const *
      km_core_state_debug_items(
        km_core_state const *state,
        size_t *num_items
      ) {

    or YES:

      km_core_state_debug_item const *km_core_state_debug_items(
        km_core_state const *state,
        size_t *num_items
      ) {

Source Code Documentation Syntax

We follow the Javadoc syntax to mark comment blocks. These have the general form:

/**
 * Brief summary.
 *
 * Detailed description. More detail.
 * @see Some reference
 *
 * @param  <name>  Parameter description.
 * @return         Return value description.
 */

Example:

/**
 * Returns a compressed version of a string.
 *
 * Compresses an input string using the foobar algorithm.
 *
 * @param   uncompressed  The input string.
 * @return                A compressed version of the input string.
 */
std::string compress(const std::string& uncompressed);

Doxygen Tags

This is the allowed set of doxygen tags that can be used (note that we use @ rather than \ for improved readability).

  • @param Describes function parameters.

    Recommend using two spaces on either side of the parameter names and lining up descriptions for greater readability.

    We don't normally mark input parameters, but [in,out] and [out] parameters should be marked:

    /**
     * @param[in,out]  modifiers  The modifier bitmap
     */
  • @return Describes return values.

  • @see Describes a cross-reference to classes, functions, methods, variables, files or URL.

    Example:

    /**
     * Available kinds of implementations.
     *
     * @see process::network::PollSocketImpl
     */
  • @file Describes a refence to a file. It is required when documenting global functions, variables, typedefs, or enums in separate files.

  • @link and @endlink Describes a link to a file, class, or member.

  • @example Describes source code examples.

  • @image Describes an image.

Wrapping

We wrap long descriptions by aligning with previous line:

/**
 * @param  uncompressed  The input string that requires a very long
 *                       description and an even longer description on this
 *                       line as well.
 */

Constants and Variables

Example:

/**
 * Prefix used to name Keyman keyboards in order to distinguish
 * them from other Javascript objects.
 */
extern const std::string KEYMAN_KEYBOARD_NAME_PREFIX;

Fields

Example:

/**
 * Buffer storing the current context, text before the input caret.
 * The buffer is null terminated, and CurContext[0] is furthest from
 * the caret.
 */
WCHAR CurContext[MAXCONTEXT]; 

Or, if you have a short description, you can use ///< after the field:

WCHAR CurContext[MAXCONTEXT]; ///< Current context, null terminated

Functions and Methods

Example:

/**
 * Returns a pointer to last n characters in the current context buffer.
 *
 * Returns a pointer to the character in the current context buffer which will
 * have at most n valid characters remaining until the the null terminating
 * character. e.g. it will be one code unit less than bufsize if that would
 * have meant splitting a surrogate pair.
 *
 * @param[in]  n   The maximum number of valid characters - (code points) not
 *                 WCHAR size (code units)
 * @return         Pointer to the start postion for a buffer of maximum n
 *                 characters
 */
WCHAR *Buf(int n);

Classes and Structs

Example:

/**
 * Provides an interface between Keyman Core input processing and the 
 * application text store. This is an abstract base class with common
 * core functionality and context cache management.
 */
class AppContext
{

Credit to: http://mesos.apache.org/documentation/latest/doxygen-style-guide/

C++ Specfic Style Guide

The "rules" in this section may not apply to other programming languages used in the keyman code base.

Platform Code

Windows™ is the original platform on which Keyman was written. Therefore there is legacy code style, conventions, and language extensions that are not consistent with common coding style as we move towards integrating common code across multiple platforms. Windows, Linux, MacOS platform code will inevitably have code that uses extensions etc needed to integrate with native platforms, breaking the guidelines for the common C++ code.

C++ Level

As of 25-Feb-2022 the toolchain supports C11 and C++14.

Datatypes

Use C or C99 types and calling convention for any API boundary, it has the broadest language foreign function interface (FFI) support. Within module implementation more complex data types are permitted.

Standard Library

Use of the C++ Standard Template Library, where possible, is preferred over home-grown or additional library data types.

Exceptions

Don’t use exceptions.

Preprocessor Macros

It is not a hard no, but prefer inline functions, enums, and const variables to macros.

Macros are used for including libraries for example KMN_API in keyboardprocessor_bits.h

Template Metaprograming

Templates are permitted but should be used in private implementation detail and not for APIs or public interfaces. Ensure template code is well commented.

Namespace

The Keyman Common Core uses the namespace km::core

The KMX keyboard processor is nested with in that name space using kmx, that is km::core::kmx. Other keyboard processor implementations can follow the same pattern, for example LDML can use km::core::ldml.

Proprietary Language Extensions

Proprietary Language Extensions are not permitted in the core cross-platform code. For platform integration components of Keyman it maybe required to use extensions, in which case that is acceptable.

Naming

There has not been a previously being a naming convention as long as it was consistent within each logical module or project.

File Names

Follow the convention of the project you are adding a file to. Otherwise, filenames should be all lowercase and can include underscores (_).

Type Names

Use snake case.

std::string my_string;  

Constants

All upper case.

Variable Names

Use snake case. All lowercase, with underscores between words.

Class Data Members

Data members of classes, same variable names (snake case) but with with m_ prefix.??

Note: the Keyman core has used a leading underscore _ however this not recommended in C++. Followed by a lowercase letter it should be ok. Using leading underscore C++

Struct Data Members

Data members of structs, both static and non-static, are named like ordinary nonmember variables.

Function Names

Use snake_case.

Include Guards

Header files shall contain include guards to avoid including the same header multiple times. For example

#ifndef CLASSNAME_H
#define CLASSNAME_H

#endif
Clone this wiki locally