Description
Currently clang will pass any structure with more than a single element indirectly (as LLVM byval). Many other ABIs pass small structures in registers, potentially improving performance. We could do the same thing, but choosing the upper limit on the size (and therefore the number of registers to use) is a little more interesting, since we don't know how many physical registers the underlying machine will have (let alone have available for passing arguments). But it's also true that linear memory references may be more expensive if there is bounds checking.
It seems worthwhile to at least use 2 params since most calling conventions are likely to be able to handle that in most situations, and pairs are fairly common.
If we do this it will make the ABI a lot more complex to specify, since there will be classification rules about when this would apply (both AMD64 SysV and AAPCS are fairly complex in this regard). But it seems worth it. We would probably need slightly different rules for promotion than those architectures since we have both 32-bit and 64-bit argument types.
I'll think a little more about the details, but does anyone have strong opinions about whether this is likely to be a good idea?