Skip to content
This repository has been archived by the owner on May 18, 2021. It is now read-only.

Establish convention for docstrings #133

Open
jcohenadad opened this issue Jun 18, 2020 · 22 comments
Open

Establish convention for docstrings #133

jcohenadad opened this issue Jun 18, 2020 · 22 comments

Comments

@jcohenadad
Copy link
Member

jcohenadad commented Jun 18, 2020

Context

Python's autodoc tools such as sphinx parse the docstrings of functions/classes to generate API docs for convenient use. Importantly, things such as input arguments, output arguments, example usage, follow a given convention, such that the generate doc is consistent.

An example of convention for python is the Google docstring, displayed here for convenience:

def function_with_types_in_docstring(param1, param2):
    """Example function with types documented in the docstring.

    `PEP 484`_ type annotations are supported. If attribute, parameter, and
    return types are annotated according to `PEP 484`_, they do not need to be
    included in the docstring:

    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.

    Returns:
        bool: The return value. True for success, False otherwise.

While there seem to be less consensus for docstrings convention in Matlab, I believe we should still try to impose some rules, so that the doc (via matlab's help) and generated doc (via the HelpDocMd repos) look reasonably good.

Potential solutions

Googling around on the Matlab website, I've come across this documented function, which does look quite nice, in that: description, input arguments are parsed out and displayed.

Couldn't we come up with something similar?

Once we agreed on a convention, it should be documented under the contributing doc

Things to consider

  • For each sub-function to be listed in matlab's browser, the definition needs to be followed by "%%" and description.
  • There should not be any other use of "%%" anywhere else in the code.

Useful links

From Mathworks:

From external contributers:

@rtopfer
Copy link
Contributor

rtopfer commented Jun 18, 2020

There is the beginning of a representative example/template in the helpDocMd repo but it's only a draft

@gaspardcereza
Copy link
Member

gaspardcereza commented Jun 18, 2020

I will adapt my doc following Ryan's drafted template for now (it seems more adapted to what we do). Previously I was trying to follow the imutils/read_nii.m example because @po09i told me that it was the best documented so far. It would indeed be great to have an exhaustive convention !

@jcohenadad
Copy link
Member Author

Currently, the __TEXT__ is not formatted by matlab's help or doc:

>> help imutils.get_nii_coordinates
 get_nii_coordinates Return voxel position "world-coordinates" in mm
      
      [x, y, z] = get_nii_coordinates( niiFile )
      [x, y, z] = get_nii_coordinates( niiInfo )
  
  Returns `x, y, z`: three 3-D arrays of voxel positions in mm.
 
  The single input can be given either as:
 
  1. `niiFile`—A path string to the NIfTI file of interest; or, 
 
  2. `niiInfo`—A struct of the form `niiInfo = niftiinfo( niiFile )`
 
  __NOTE__
  The function is implemented as
  ```
  [i, j, k] = ndgrid( [0:niiInfo.ImageSize(1)-1], [0:niiInfo.ImageSize(2)-1], [0:niiInfo.ImageSize(3)-1] ) ;
  [x, y, z] = niiInfo.Transform.transformPointsForward( i, j, k ) ;  
  ```
  __TODO__
  * function should be further tested.

is there a reason for having chosen this markdown? @rtopfer @po09i ?

@jcohenadad
Copy link
Member Author

So far, the following template has been used.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

Currently, the __TEXT__ is not formatted by matlab's help or doc:

is there a reason for having chosen this markdown? @rtopfer @po09i ?

Apart from highlighting and linking to function names, I don't think the printed help output in MATLAB does any additional formatting (i.e. not even with their own Markup)

So far it's just trial-and-error, experimenting with that looks decent

  • in the MATLAB terminal, via help, vs
  • in the MATLAB browser (via doc), vs
  • on the site once rendered as html...

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

I should add: The particular Markdown formatting is indeed informed by MATLAB's Markup rules, which is very similar to Markdown but there are discrepancies.

For now, their Markup only seems to apply to Live scripts/functions, but I'd assume that MATLAB will eventually support Markup rendering within their own help and doc displays for regular .m files (i.e. without having to first convert to HTML...)

@jcohenadad
Copy link
Member Author

jcohenadad commented Jun 19, 2020

So far it's just trial-and-error, experimenting with that looks decent

right. my current strategy is the following:

  • understand how to properly format with matlab's browser;
  • then, adapt header to obtain reasonably good output with the 'help' command, while trying to be as close as possible to this template
  • then, modify the HelpDocMd parser to figure out what needs to be figured out (regex to find SYNTAX, INPUTS, etc.)

EDIT:

For now, their Markup only seems to apply to Live scripts/functions, but I'd assume that MATLAB will eventually support Markup rendering within their own help and doc displays for regular .m files (i.e. without having to first convert to HTML...)

that is also my assumption, so sticking to the "browser" rendering syntax seems wise

@jcohenadad
Copy link
Member Author

For now, the publish function uses Matlab's markdown, which is extremely limited. There is a possibility to modify the stylesheet when generating the html.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

Re: Linking to references at the end of the documentation header, I found a hack yesterday worth noting + probably worth including in a template:

MATLAB automatically adds the links when function/script/class entries are provided in a list where the line begins with "See also", e.g.

% See also
% struct
% Documentor

Ideally, this reference section should work with local links to help entries while running MATLAB and weblinks: After considerable trial and error I realised that by adding spaces between the sq. brackets both forms work, e.g

% See also
% [ struct ](https://www.mathworks.com/help/matlab/ref/struct.html)
% [ Documentor ](https://github.com/shimming-toolbox/helpDocMd)

When displayed in the MATLAB terminal, you can click on struct or Documentor and it should immediately print the local help entry in the terminal; But on the right, you can still click on the URL to pull up the browser. Conveniently, when published to .md for the website, it will just be the weblink!

Ultimately we would still want a parsing scheme for links in the autodoc process (e.g. to add/validate relative links for the Mkdocs build) but that will be a bigger undertaking.

@jcohenadad
Copy link
Member Author

jcohenadad commented Jun 19, 2020

A few trials & errors:

Header at the beginning of the file

The following:

%% dicom_to_nifti
% Converts dicoms into niftis with dcm2bids output.

function dicom_to_nifti( unsortedDicomDir, niftiPath )
%% dicom_to_nifti
% converts dicoms into niftis with dcm2bids output 
%
% *SYNTAX*
% ...

will look nice on the browser (see below):
image

but will render a truncated doc this 'help':

>> help imutils.dicom_to_nifti
  dicom_to_nifti
  Converts dicoms into niftis with dcm2bids output.

So, it should be avoided

Header for function

the traditional matlab's:

function dicom_to_nifti( unsortedDicomDir, niftiPath )
%%DICOM_TO_NIFTI converts dicoms into niftis with dcm2bids output 

will unfortunately not render in the browser. So a reasonable alternative is the following (notice the space between "%%" and "dicom_to_nifti"):

function dicom_to_nifti( unsortedDicomDir, niftiPath )
%% dicom_to_nifti
% converts dicoms into niftis with dcm2bids output 

Matlab hyperlinks

Unfortunately, @rtopfer's suggestion will not render in the browser:
image

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

Works for me?
via help
Screen Shot 2020-06-19 at 6 15 11 PM
via doc
Screen Shot 2020-06-19 at 6 15 40 PM

Try adding a comma after the end parenthesis ")" and 2x spaces after each line in % See also section

(+ the previous example assumes Documentor or whatever file you're hoping to link to is on the MATLAB path)

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

I thought maybe you meant in the webbrowser, eg. github, but this still works for me?
Google

@jcohenadad
Copy link
Member Author

hehe... that's where things get complicated: there are "two" doc commands:

  • matlab's "browser" equivalent of help (a nice formatted version of help):
    image

  • matlab's "live browser" (which i've been working on, and been referencing to):
    image

the rule of thumb: what works for 'help' works for the first doc.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 19, 2020

Ah I see... If the latter will ultimately display the rendered HTML there could be a step added in the doc process remove the inner-bracket spaces

Granted, it might not be worth trying to accommodate this very hacky hack, I just wanted to avoid having to repeat references, e.g. a section for normal weblinks followed by the usual See also part...

@rtopfer
Copy link
Contributor

rtopfer commented Jun 20, 2020

I don't have a strong opinion either way but Re: SYNTAX and DESCRIPTION sections, while it would be useful to include the sections explicitly in a template for consistency + as a reminder to include them in a specific order, I find they're rather redundant: it's pretty obvious what the sections are without the added text, which is probably why MATLAB doesn't display section headers in the printed help for their own functions (though they're always present in browser version)

@jcohenadad
Copy link
Member Author

jcohenadad commented Jun 20, 2020

After struggling with Matlab's markup for rendering in the "new doc" (2nd option in this comment), I realize there are intrinsic incompatibility that makes this plan impossible.

So, my updated strategy is the following:

  • make a header that looks good with the 'help' command (and a fortiori the "old doc"), while trying to be as close as possible to this template
  • modify helpDocMd to:
    • Accommodate Matlab's doc formatting syntax (i.e., add section heading, put "INPUTS" in bold, create html table for all inputs/outputs, update hyperlinks, etc.)
    • add a call to publish for each .m file
    • run builddocsearchdb()
  • get rid of external/genToolboxHelp

@rtopfer @po09i comments welcome

@jcohenadad
Copy link
Member Author

jcohenadad commented Jun 20, 2020

Following up on the plan above, these are the exceptions/compromises to be made:

  • hyperlinks in headers will be encoded as html , so they will be automatically parsed for the "new doc".

@po09i
Copy link
Member

po09i commented Jun 22, 2020

I have not worked on helpDocMd enough to know how feasible adding a new formatting output is as it was designed with markdown output from the beginning. Hopefully this can easily be done as it seems a really good idea. What do you think @rtopfer

@rtopfer
Copy link
Contributor

rtopfer commented Jun 22, 2020

The motivation/context isn't entirely clear to me. Is the idea simply to have .html documentation for viewing internally within the Matlab browser?

What would the advantage to using publish be (e.g. vs. taking the .html output from the Mkdocs build?)? publish doesn't really work with classes, which, indeed, was/is the basic reason for the helpDocMd repo. (Granted, at this point we could probably just print the current class .md output to .m files and call publish on those, i.e. as if they were merely scripts consisting entirely of comments)

Re: tabling inputs and outputs, i find Markdown tables a bit easier to work with and inherently more readable than html tables. Though I don't think md tables support multi-line entries for a single cell, I don't think that's such a bad thing as it naturally encourages brevity (e.g. use a concise description in the table, and include a subsequent NOTES or ETC section w/extra details when necessary). We could, for instance, incorporate the text table into the doc template, and then parse it + reformat it to html for the matlab browser display.

Matlab has the advantage of having some introspection tools built-in, so that helps autogenerate some doc content, but in terms of parsing/reformatting (or file management, for that matter) it would probably be much easier to do some of this in a different environment/language (e.g. sed, awk, perl).

(Sidenote: this could be entirely specific to me, but it's rare that I ever use the matlab browser for viewing documentation: if the terminal help display isn't enough, I usually defer to google. so, at least for my 'workflow' the basic terminal formatting and the rendering on the toolbox site would be the main priorities/go-to's)

@jcohenadad
Copy link
Member Author

The motivation/context isn't entirely clear to me. Is the idea simply to have .html documentation for viewing internally within the Matlab browser?

yes, indeed.

What would the advantage to using publish be (e.g. vs. taking the .html output from the Mkdocs build?)? publish doesn't really work with classes, which, indeed, was/is the basic reason for the helpDocMd repo. (Granted, at this point we could probably just print the current class .md output to .m files and call publish on those, i.e. as if they were merely scripts consisting entirely of comments)

the advantages I see with publish are:

Re: tabling inputs and outputs, i find Markdown tables a bit easier to work with and inherently more readable than html tables.

are you referring to matlab's markdown tables? do you have an example? i really like html table outputs, each argument appears very clearly, and we can use columns for: name, type, description, e.g. here

Though I don't think md tables support multi-line entries for a single cell, I don't think that's such a bad thing as it naturally encourages brevity (e.g. use a concise description in the table, and include a subsequent NOTES or ETC section w/extra details when necessary). We could, for instance, incorporate the text table into the doc template, and then parse it + reformat it to html for the matlab browser display.

not sure i understand what you have in mind. The "NOTES" and "ETC" would not be part of the "INPUTS", right? (therefore, would not be included in the tables). To make it clear: i was suggesting the use of tables only for input arguments (section "INPUTS")

Matlab has the advantage of having some introspection tools built-in, so that helps autogenerate some doc content, but in terms of parsing/reformatting (or file management, for that matter) it would probably be much easier to do some of this in a different environment/language (e.g. sed, awk, perl).

hum... from what i've seen from the HelpDocMd, i feel we are very close from achieving this. You already parse everything. The only bit missing is support for another output format. But i admit i haven't dig deep into the code.

(Sidenote: this could be entirely specific to me, but it's rare that I ever use the matlab browser for viewing documentation: if the terminal help display isn't enough, I usually defer to google. so, at least for my 'workflow' the basic terminal formatting and the rendering on the toolbox site would be the main priorities/go-to's)

ok, well. We don't absolutely need to bother with a matlab doc. TBH, i'd rather not, given how poorly-developed their auto-doc environment is, and how fast it evolves across matlab releases, with minimum concerns for cross-compatibility. I started exploring this path, having in mind a "proper matlab toolbox", with all the associated features. But i am certainly not married to this idea. And at this point, we definitely have other priorities.

So let's revisit this idea later, and for now we can focus on fixing the remaining issues on HelpDocMd, and especially agreeing on a header for .m files.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 22, 2020

the advantages I see with publish are:

* accessing the documentation from within Matlab for conveniency

* getting closer to a proper Matlab toolbox #99

👍 just to underline my previous suggestion "print the current class .md output to .m files and call publish on those, i.e. as if they were merely scripts consisting entirely of comments"—this would be trivial to implement (though, whether it would be adequate in itself is another question)

are you referring to matlab's markdown tables? do you have an example?

I mean like this which then renders on the site like

hum... from what i've seen from the HelpDocMd, i feel we are very close from achieving this. You already parse everything.

It doesn't really parse anything 😬 e.g. I initially spent quite a bit of time trying to parse Matlab's suggested html format for links before getting frustrated and realising the standard [name](url) format works fine too. (I'm sure someone w/a cursory understanding of regex could parse everything easily within Matlab)

So let's revisit this idea later, and for now we can focus on fixing the remaining issues on HelpDocMd, and especially agreeing on a header for .m files.

👍

@jcohenadad
Copy link
Member Author

It doesn't really parse anything 😬 e.g. I initially spent quite a bit of time trying to parse Matlab's suggested html format for links before getting frustrated and realising the standard name format works fine too. (I'm sure someone w/a cursory understanding of regex could parse everything easily within Matlab)

ah! indeed. Now i understand why there are all those funky characters in the header. Well in that case, since we are "close" to a nice solution, let's not spend too much time developing a parser. That could be a summer intern CS project.

On the other hand, we could still tweak the format of the fields using simple solution, as discussed here.

ok, so let's move on with the convention. My only strong opinion is about how we display input arguments. In this example, only the "option" flag is displayed in a table. I would like all input arguments to be displayed in a table. For struct, we could refer to a table below. Example:


INPUTS:

Name Type Description
arg1 str Does amazing things.
arg2 float Does horrible things. Don't use it
arg3 struct Additional parameters to set. See details below.

arg3

Name Type Description
.field1 str Awesome field.
.field2 bool 0: Not to be, 1: To be.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants