Skip to content

Commit

Permalink
Committee has decided not to elide excess brackets in character classes
Browse files Browse the repository at this point in the history
This affects the existing [[:rname:^*=]]... and the new [[:print:]].
  • Loading branch information
jmarshall committed Jan 28, 2025
1 parent 4560a37 commit 3692643
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions SAMv1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@
\newcommand*{\firstbytebox}[2]{\byteboxAux{#1}{#2}{\put(0,0){\line(0,1){\bytetotalheight}}}}
\newcommand*{\bytebox}[2]{\byteboxAux{#1}{#2}{}}

\newcommand*{\cclass}[1]{{\rm\sf :#1:}}
\newcommand*{\cclass}[1]{[{\rm\sf :#1:}]}
\newcommand*{\cclassexcept}[2]{[{\rm\sf :#1:}\caret #2]}
\newcommand*{\caret}{\textsuperscript{$\wedge$}}

\newcommand*{\memlimited}{\textcolor{gray}{\footnotesize\it limited}}
Expand Down Expand Up @@ -81,7 +82,6 @@ \section{The SAM Format Specification}
For example, floating-point values in SAM always use `{\tt .}' for the decimal-point character.

The regular expressions in this specification are written using the POSIX\,/\,IEEE Std 1003.1 extended syntax.
For brevity, named character classes are written as~{\tt [\cclass{class}]} without an additional pair of brackets.

\subsection{An example}\label{sec:example}
Suppose we have the following alignment with bases in lowercase
Expand Down Expand Up @@ -213,9 +213,7 @@ \subsubsection{Character set restrictions}\label{sec:charset}
{\tt [\verb"0-9A-Za-z!#$%&+./:;?@^_|~-"][\verb"0-9A-Za-z!#$%&*+./:;=?@^_|~-"]*}
\end{center}
% Pedantically this should be [[:rname:]^*=][[:rname:]]*, but we take advantage
% of POSIX (Issue 7) section 9.3.5/8 to elide the excess brackets for clarity.
\newcommand*{\rnameRegexp}{[\cclass{rname}\caret*=][\cclass{rname}]*}
\newcommand*{\rnameRegexp}{[\cclassexcept{rname}{*=}][\cclass{rname}]*}
\noindent
For clarity, elsewhere in this specification we write this set of allowed characters as a character class~{\tt [\cclass{rname}]} and extend the POSIX regular expression notation to use {\tt\caret *=} to indicate the omission of `{\tt *}' and `{\tt =}' from the character class.
Expand Down Expand Up @@ -305,6 +303,7 @@ \subsection{The header section}
These alternative names are not used elsewhere within the SAM file;
in particular, they must not appear in alignment records' {\sf RNAME}
or~{\sf RNEXT} fields.
\newline
\emph{Regular expression}: \emph{name}{\tt (,}\emph{name}{\tt )*}
where \emph{name} is {\tt\rnameRegexp}\\\cline{2-3}
& {\tt AS} & Genome assembly identifier. \\\cline{2-3}
Expand Down

0 comments on commit 3692643

Please sign in to comment.