-
Notifications
You must be signed in to change notification settings - Fork 4
/
case-studies.tex
164 lines (108 loc) · 36.9 KB
/
case-studies.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
% !TEX root = thesis.tex
\section{Case Studies}
\label{sec:case-studies}
After having done a broad review of open source code for low resource languages above, here I dive deeper by looking for resources for two languages in particular: Scottish Gaelic and Naskapi. Both of these are living languages with speaking communities, although their size, coverage by academic research, and political situations are slightly different. Searching for resources for a specific language is most likely the most common use case for users interested in LRLs, especially as the majority of LRL researchers work with a single language or a suite of languages that they use themselves, as opposed to researchers working on quantitative studies of languages in general. A deep dive should illuminate how open source methodologies can drive language development.
\subsection{Scottish Gaelic}
\label{sec:gaelic}
Scottish Gaelic (G\`aidhlig is the autonym) is a Celtic language spoken by roughly 60,000 people mainly in the United Kingdom and to a lesser extent in Canada. Gaelic - sometimes called Scots Gaelic, simply Gaelic, or the Gaelic - is a Goidelic or Q-Celtic language, along with Manx and Irish (also sometimes called Irish Gaelic, but here always referred to as Irish). This means that, while related to the Brythonic languages of Welsh, Cornish and Breton, it is different enough to not be able to benefit from the many resources available in Welsh, which, while endangered, has a much stronger academic interest and presence in the United Kingdom, with roughly half a million speakers. Gaelic has traditionally been heavily repressed, both politically and culturally, which has lead to its usage in largely restricted or rural areas, and in the domains of the house, church, and family \citep{mackinnon1991past}.
The 2011 Scottish Census indicates that out of the total amount of Gaelic speakers, only around half - 32,191 persons to be exact - read and write in Gaelic.\footnote{\href{http://www.scotlandscensus.gov.uk/}{http://www.scotlandscensus.gov.uk/}. \last{April~27}} 6,218 speak and read the language, but do not write it, while 4,646 can read it, but do not speak or write it. Gaelic officially is not a national language, although it is afforded certain protections under the European Charter for Regional or Minority Languages\footnote{\href{https://www.coe.int/en/web/conventions/full-list/-/conventions/treaty/148}{https://www.coe.int/en/web/conventions/full-list/-/conventions/treaty/148}. \last{April~27}} (although, as this is an EU charter, it is unclear whether Britain will continue to ratify it following their impending exit from the European Union). The Gaelic Language (Scotland) Act of 2005 (GLS) gave Gaelic official status as an official language of Scotland,\footnote{\href{http://www.legislation.gov.uk/asp/2005/7}{http://www.legislation.gov.uk/asp/2005/7}. \last{April~27}} and set up the B\`ord na G\`aidhlig\footnote{\href{http://www.gaidhlig.scot/}{http://www.gaidhlig.scot/}. \last{April~27}} as a language developmental body tasked with protecting and vitalising the Gaelic language.
The B\`ord officially is tasked with promoting and facilitating educational materials, but the initial charter makes no mention of language technology. The National Gaelic Language Plan 2018-2023\footnote{\href{http://www.gaidhlig.scot/launch-of-the-new-national-gaelic-language-plan/}{http://www.gaidhlig.scot/launch-of-the-new-national-gaelic-language-plan/}. \last{April~27}} \citep{bord2018national} mentions the Digital Archive of Scottish Gaelic (DASG),\footnote{\href{http://dasg.ac.uk/en}{http://dasg.ac.uk/en}. \last{April~27}} the largest corpus project for Gaelic, but does not specify other language technology being developed (excepting a brief mention of working with Ireland and Nova Scotia developing shared technology and resources). There are some primary and secondary schools, as well as various Gaelic Language and Studies degrees at English-speaking universities, as well as one Gaelic-speaking university Sabhal M\`or Ostaig\footnote{\href{http://www.smo.uhi.ac.uk/en/}{http://www.smo.uhi.ac.uk/en/}. \last{April~27}} on Skye; educational material from the B\`ord is mainly focused in these areas.
\subsubsection{Language vitality status}
\label{sec:gaelic-vitality-status}
Gaelic has an EGIDS rating of 2, as it is a provincial language given the 2005 GLS Act.\footnote{\href{https://www.ethnologue.com/language/gla}{https://www.ethnologue.com/language/gla}. \last{April~27}} \citet{lewis2009ethnologue} note regarding language use that "Resurgence of interest in Scottish Gaelic in 1990s. A number of children learn the language but there are serious problems in language maintenance even in the core areas \citep{salminen2007endangered}. Home, church, community." UNESCO judges it to be {\it definitely endangered}.\footnote{\href{http://www.unesco.org/languages-atlas/en/atlasmap/language-iso-gla.html}{http://www.unesco.org/languages-atlas/en/atlasmap/language-iso-gla.html}. \last{April~27}} The Endangered Languages Project describes it as Threatened or Vulnerable, depending on the source,\footnote{\href{http://endangeredlanguages.com/lang/3049}{http://endangeredlanguages.com/lang/3049}. \last{April~27}} as \citet{salminen2007europe} gives a much smaller population number of 20k for speakers than the other census-based data. \citepos{kornai2013digital} rating declares it as {\it Living}.\footnote{\href{https://hlt.bme.hu/en/dld/language/4656}{https://hlt.bme.hu/en/dld/language/4656}. \last{April~27}} These ratings are summarized in Table~\ref{table:gaelic}.
\begin{table}
\centering
\begin{tabular}{|p{5cm}|p{5cm}|} \hline
{\bf Scale} & {\bf Grade} \\ \hline
UNESCO & Definitely endangered\\ \hline
Ethnologue (EGIDS) & 2 (Provincial) \\ \hline
LEI & Threatened or Vulnerable \\ \hline
Kornai & Living \\ \hline
\end{tabular}
\caption{Scale for Gaelic}
\label{table:gaelic}
\end{table}
\subsubsection{Language resources}
\label{subsec:gaelic-resources}
Gaelic has a long, written history. Today, there are a plethora of written, audio, and video resources. Some of these have been bundled into linguistic corpora.
\citet{bauer2014salt} gives an overview of how to get language development bootstrapped by using a translator, a lexicographer, and a software developer along with a dedicated roadmap of intentions and goals. In this paper, he gives a good overview of the state of Gaelic digitally.
\begin{quote}
The first digital Gaelic tool - the St\`{o}r-d\`{a}ta,\footnote{\href{http://www2.smo.uhi.ac.uk/gaidhlig/faclair/sbg/lorg.php}{http://www2.smo.uhi.ac.uk/gaidhlig/faclair/sbg/lorg.php}. \last{May~4}} an online termbase - appeared in 1994. Between 1994 and 2008, about a dozen other tools appeared, most of which then fell dormant for a period (the Opera web browser,\footnote{\href{https://www.opera.com/}{https://www.opera.com/}. \last{May~4}} OpenOffice.org\footnote{\href{http://www.openoffice.org/}{http://www.openoffice.org/}. \last{May~4}} and the Ning-based social network AbairThusa) or died off when funding/support ran out or the localizer moved on. Since the end of 2009 however, just over 50 additional programs and tools have appeared, ranging from games and web-apps through predictive texting tools to operating systems (Ubuntu,\footnote{\href{https://www.ubuntu.com/}{https://www.ubuntu.com/}. \last{May~4}} Windows\footnote{\href{https://www.microsoft.com/}{https://www.microsoft.com/}. \last{May~4}} and the upcoming Mozilla OS\footnote{This has since been shipped, and discontinued. \href{https://support.mozilla.org/en-US/products/firefox-os}{https://support.mozilla.org/en-US/products/firefox-os}. \last{May~4}}), allowing users to conduct a large percentage of their daily IT through the medium of Gaelic.
These were almost all created by two (largely) unpaid part-time localizers and two (largely) unpaid part-time developers. Their time involvement is difficult to quantify but an estimate puts it at 1.5 FTE\footnote{Full Time Equivalent, or the amount of time an employee spends in a year.} of localizer and lexicographer time and 0.5 of a FTA\footnote{This most likely meant FTE, or full time equivalent, as well.} of developer time over the last four years.
\end{quote}
Outside of these efforts, there are language resources available. The DASG is the largest corpus for Gaelic available on the web; however, it is not permissively licensed for modification, distribution, or reproduction, and so cannot be considered open source (although it is open access).\footnote{\href{http://dasg.ac.uk/about/terms/en}{http://dasg.ac.uk/about/terms/en}. \last{April~27}} OLAC has 26 resources for Gaelic, including large multilingual corpora, as well.\footnote{\href{http://www.language-archives.org/language/gla}{http://www.language-archives.org/language/gla}. \last{April~27}} A large corpus compiled by An Crub\'ad\'an is available online \footnote{\href{http://crubadan.org/languages/gd}{http://crubadan.org/languages/gd}. \last{April~27}} \citep{scannell2007crubadan}. WALS has 61 typological features listed for Gaelic,\footnote{\href{http://wals.info/languoid/lect/wals_code_gae}{http://wals.info/languoid/lect/wals\_code\_gae}. \last{April~27}} and Glottolog 35 references.\footnote{\href{http://glottolog.org/resource/languoid/id/scot1245}{http://glottolog.org/resource/languoid/id/scot1245}. \last{April~27}} ODIN has 59 IGT entries for Scottish Gaelic.\footnote{\href{http://odin.linguistlist.org/}{http://odin.linguistlist.org/}. \last{April~27}}
Some of these corpora are annotated - for instance, the Annotated Reference Corpus of Scottish Gaelic (ARCOSG)\footnote{\href{https://datashare.is.ed.ac.uk/handle/10283/2011}{https://datashare.is.ed.ac.uk/handle/10283/2011}. \last{April~27}} \citep{ARCOSG2016, lamb2014scottish}, which used an Irish POS tagger \citep{ui2006part} to project annotations, and which was funded by the B\`ord na G\`aighlig. This resource was used to automatically derive categorial grammars \citep{batchelor2016automatic}, and to develop POS taggers directly for Gaelic \citep{lamb2014developing}. A dependency-structure corpus is being developed \citep{batchelor2014gdbank}, as are word-embedding models \citep{lamb2016developing}. The source code for \citet{batchelor2014gdbank, batchelor2016automatic} is available on GitHub.\footnote{\href{https://github.com/colinbatchelor/gdbank/}{https://github.com/colinbatchelor/gdbank/}. \last{April~27}} Some of these papers were presented at the first Celtic Language Technology Workshop in Dublin in 2014. The amount of resources show clearly that Gaelic is not entirely on the fringe of academic research, although it is generally considered a low resource language.
\citet{scannell2007crubadan} and contributors\footnote{\href{http://crubadan.org/acknowldegments}{http://crubadan.org/acknowldegments}. \last{May~2}} used the Cr\'ubad\'an corpus to create an open source Hunspell spellchecker,\footnote{\href{https://github.com/kscanne/hunspell-gd}{https://github.com/kscanne/hunspell-gd}. \last{April~27}} which is the spellchecker for "LibreOffice, OpenOffice.org, Mozilla Firefox 3 and Thunderbird, Google Chrome, and it is also used by proprietary software packages, like macOS, InDesign, memoQ, Opera and SDL Trados."\footnote{\href{https://hunspell.github.io/}{https://hunspell.github.io/}. \last{April~27}} This spellchecker was built with the help of Michael Bauer of \citet{bauer2014salt}, an independent Gaelic technologist who runs a small Gaelic technology consultancy called Am Faclair Beag,\footnote{\href{http://www.faclair.com/}{http://www.faclair.com/}. \last{April~27}} and also has ports for OpenOffice directly\footnote{\href{https://addons.mozilla.org/ga-IE/firefox/addon/scottish-gaelic-spell-checker/}{https://addons.mozilla.org/ga-IE/firefox/addon/scottish-gaelic-spell-checker/}. \last{April~27}} and a Firefox extension.\footnote{\href{https://extensions.openoffice.org/en/project/faclair-afb}{https://extensions.openoffice.org/en/project/faclair-afb}. \last{April~27}} An Faclair Beag also offers an online dictionary with over 85k words\footnote{\href{http://www.faclair.com/GaelicDictionaryAbout.html\#About}{http://www.faclair.com/GaelicDictionaryAbout.html\#About}. \last{April~27}}
(and almost a million forms\footnote{\href{http://www.faclair.com/News.html}{http://www.faclair.com/News.html}. \last{April~27}} \citep{bauer2014salt}) and an in-built lemmatizer.\footnote{\href{http://www.faclair.com/News.html}{http://www.faclair.com/News.html}. \last{April~27}} Another spellchecker exists on GitHub,\footnote{\href{https://github.com/gooselinux/hunspell-gd}{https://github.com/gooselinux/hunspell-gd}. \last{April~27}} which was developed by the B\`ord and the European Language Initiative and Microsoft.\footnote{\href{http://www.sealgar.co.uk/spell.jsp}{http://www.sealgar.co.uk/spell.jsp}. \last{June~6, 2018}} It is also freely available on Microsoft's website.\footnote{\href{https://www.microsoft.com/en-gb/download/details.aspx?id=35400}{https://www.microsoft.com/en-gb/download/details.aspx?id=35400}. \last{June~6, 2018}} It has not been worked on recently.
More complicated, higher level technology has been developed. Previous academic work on Gaelic text-to-speech systems (TTS) stretches back at least 20 years; a diphone text-to-speech system for Gaelic was developed, for instance, in 1997, by \citet{wolters1997diphone}, although that is not open source. Today, there is a proprietary synthetic TTS system called Ceitidh\footnote{\href{https://www.cereproc.com/en/CereProc_Gaelic_Synthetic_Voice_Ceitidh}{https://www.cereproc.com/en/CereProc\_Gaelic\_Synthetic\_Voice\_Ceitidh}. \last{April~27}} (pronounced `Katie'), created by a private Gaelic company together with funding from the Scottish Government and the B\`ord na G\`aidghlig. Although Ceitidh is available to developers and students at a reduced or free fee, it is not entirely open source. There are almost no open source sound resources. The main reason is that there is no overall quality assurance for Gaelic sound uploaded online. For large languages, this is not a problem; however, for smaller languages, the size of the corpus means that much of the content may come from only a few sources, none of which may be ideal. This issue may involve general lack of relevance of sound files, or poor quality recordings, or any dialect or non-mainstream features slipping in. Ceitidh was based on original audio files from Kirsteen MacDonald (in Gaelic, Kirsteen NicDh\`{o}mhnaill), some of whose content (while not vetted by an independent linguist) is available on LearnGaelic.scot,\footnote{\href{https://learngaelic.scot/}{https://learngaelic.scot/}. \last{April~27}} which could be hypothetically used to build an open source TTS system. However, quality assurance would be an arduous step.
Navigating resources to identify what is open source and what is not is difficult. As mentioned in Section~\ref{subsec:where-is-open-source-code}, one of the OSI's definitions for open source is that it be well publicised. This cannot be said to be the case for coding resources for Gaelic; there is no central location for viewing tools. The LRE Map has no Gaelic resources, although a POS Tagger, two corpora, a tokenizer, and Babouk corpus tool resource are mentioned for Irish.\footnote{\href{http://www.resourcebook.eu/searchll.php}{http://www.resourcebook.eu/searchll.php}. \last{April~27}} Linghub returns 30 entries - not many, considering it is an aggregator.\footnote{\href{http://linghub.org/search/?query=Gaelic}{http://linghub.org/search/?query=Gaelic}. \last{April~27}} GitHub returns 62 repositories that mention Gaelic,\footnote{\href{https://github.com/search?q=gaelic}{https://github.com/search?q=gaelic}. \last{April~27}} although it is unclear if these are for Irish.
The best resource is arguably Kornai's lab page\footnote{\href{https://hlt.bme.hu/en/dld/language/4656}{https://hlt.bme.hu/en/dld/language/4656}. \last{April~27}} (again, in development). While not linking directly, it does give some information. It notes that there are: several language packs at the OS level for Ubuntu and Windows input, but not one for Mac, probably because Gaelic uses the Roman alphabet and a UK keyboard suffices for most needs;\footnote{I use the US International Keyboard with OSX to type Gaelic accents, myself, and have never needed another keyboard layout for this} a large Wikipedia; a Hunspell checker; OLAC texts (with marginally out of date numbers); a large Cr\'ubad\'an corpus (1,541,302 words and 17,308 documents), as well as a large Indigenous Tweets corpus with half a million words; and general coverage in Omniglot,\footnote{\href{http://omniglot.com/writing/gaelic.htm}{http://omniglot.com/writing/gaelic.htm}. \last{April~27}} bible.org,\footnote{\href{https://bible.org/}{https://bible.org/}. \last{April~27}} Panlex,\footnote{\href{https://panlex.org/}{https://panlex.org/}. \last{April~27}} and the Leipzig corpora \citep{goldhahn2012building}.\footnote{\href{http://wortschatz.uni-leipzig.de/en/download/}{http://wortschatz.uni-leipzig.de/en/download/}. \last{April~27}} Some of the stats are dubious. For instance, 15k wikipedia users seems odd for a language where there a total population of 30k literate speakers; and it incorrectly states that there is no Gaelic typological information in WALS. However, in general, this gives a better overview than any other source.
As far as I am aware, the highest amount of code resources for Gaelic which are directly linked and open source is the corpus {\tt low-resource-languages}, described in \citet{CCURL} and in Section~\ref{sec:solutions}. There are six resources mentioned in the list,\footnote{\href{https://github.com/RichardLitt/low-resource-languages\#scottish-gaelic}{https://github.com/RichardLitt/low-resource-languages\#scottish-gaelic}. \last{April~27}} which was largely sourced by manually inspecting each of the GitHub repositories mentioning "Gaelic", and also through personal curation during general research for this paper.
Ideally, researchers would start to open source more of their code involving Gaelic. However, there are so few researchers and language communities currently working on Gaelic HLT that this may be a na\"{i}ve wish. Indeed, the main two researchers over the past decade for Gaelic releases have released most of their code publicly, and are generally willing to collaborate with anyone who wants to put in the effort to help develop them further.
\citet{bauer2014salt} ends with these takeaways:
\begin{quote}
\begin{itemize}
\item Dissemination of information, user support and promotion must be considered at an early stage, as such tools will not simply disseminate through their mere existence.
\item FOSS is harder to `sell' to everyday users but ultimately the only really sustainable model for small and medium languages in most cases.
\item It is nonetheless very doable, as since 2009 Gaelic has acquired a lot of new SALT (Speech and Language Technology) through the work of small group of people and any language development agency should seriously consider supporting or setting up such a group.
\end{itemize}
\end{quote}
All of these are important messages, particularly regarding open source and LRL research.
One possible solution for the problem of finding developers interested in working on Gaelic would be to implement a Scottish Gaelic computational linguistics course at one of the major Scottish universities, such as the University of Edinburgh, Glasgow, St. Andrews, or potentially at Sabhal M\`or Ostaig. This option would reward further lines of inquiry.
\subsection{Naskapi}
\label{sec:naskapi}
In October 2017 I travelled to Kawawachikamach and informally interviewed linguists working on a Naskapi Bible, visited the school and talked to teachers at length about language efforts there, and talked to individual Naskapi speakers about their thoughts on the language and how it is used. Below, I give a brief overview of Naskapi, note how it would be rated according the metrics covered in Section~\ref{subsec:metrics}, and discuss language resource development. \citet{jancewicz2002applied} is the main source of published information on Naskapi computational developments; I give an update, 15 years on, given my experience in Kawawachikamach. %I was unfortunately unable to meet Bill Jancewicz, the SIL missionary there, at that time.
\subsubsection{Language background}
\label{sec:naskapi-language-background}
Naskapi (autonymically \sylla{naskapi} naskapi or \sylla{iyuw iyimuun} iyuw iyimuun) is a Cree language in the Algonquin family spoken in central Quebec \citep{MacKenzie-and-Jancewicz-1994}. Virtually the entire population of around 900 Naskapi live within the reservation Kawawachikamach, around 10 miles from Schefferville, QC. There is another Naskapi community on the Labrador coast, who speak another dialect known as Mushuau Innu, which is out of scope of this paper. Schefferville is only accessible by train or plane, and contains another local tribe called the Innu (which has more than 17,000 members, scattered among Quebec and Labrador\footnote{\href{https://en.wikipedia.org/wiki/Innu}{https://en.wikipedia.org/wiki/Innu}. \last{April~27}}), who live on their own reservation and who speak Montagnais or Innu-aimun, a related language. The two languages are similar, and the Naskapi youth are often diglossic in Montagnais (but the Innu are often not) \citep{macKenzie1980towards}.
The Naskapi speak English as a first or second language, while the Innu speak French (and some speak three or all four languages). They moved to Kawawachikamach in the 1980s, after initially being resettled in Schefferville in 1956. Some of the elders still remember being a nomadic people who followed caribou and were raised in the bush. However, half of the population is under the age of 16, and nationally the First Nations population is the largest growing population in Canada.\footnote{\href{http://www12.statcan.gc.ca/census-recensement/2016/dp-pd/index-eng.cfm}{http://www12.statcan.gc.ca/census-recensement/2016/dp-pd/index-eng.cfm}. \last{May~2}}
All of the Naskapi speak their own language regularly, in all contexts - excepting, perhaps, digitally. In the schools, there are Naskapi-only classes held until Grade 8 \citep{llewellyn2017oral}. While there are a few social workers, teachers, and nurses who speak solely English, most jobs in Kawawachikamach are held by Naskapi. There has been a long tradition of missionaries, and almost all of the Naskapi are Protestant. At church, they use contemporary Naskapi language hymnals and a Naskapi Bible \citep{naskapi-new-testament}, published in 2007. Prior to this publication, they used a Moose Cree New Testament (written in syllabics), with James Bay Cree and Moose Cree prayer books and hymnals \citep{jancewicz2013grammar}.
until about 1996, there were no Naskapi scriptures, and the Naskapi used a Moose Cree (in syllabics) New Testament and James Bay Cree and Moose Cree prayer books, hymnals etc. Through the 2002-2007 years they transitioned to using an increasing amount of their own translated materials in church and school. The Naskapi New Testament was published in 2007 and since then there has been a growing use of the contemporary Naskapi language materials in their church.
\subsubsection{Language vitality status}
\label{sec:naskapi-vitality-status}
\citet{lewis2009ethnologue} classifies Naskapi as Level 4 (educational), and notes that "Literacy rate in L1: Western Naskapi: 50\%. Literacy rate in L2: 50\%. Ongoing community language program in Western Naskapi. All children through in [sic] kindergarten through grade 6 can read and write in the language (2017 N. Jancewicz).\footnote{This was gathered through personal communication with Norma Jean Jancewicz, one of the SIL missionary married couples together with Bill Jancewicz (SIL personal communication, 2018).} Taught in primary schools in Western Naskapi. Dictionary. Grammar. NT: 2007. "\footnote{\href{https://www.ethnologue.com/language/nsk}{https://www.ethnologue.com/language/nsk}. \last{April~27}} UNESCO defines it as {\it vulnerable}.\footnote{\href{http://www.unesco.org/languages-atlas/en/atlasmap/language-id-2354.html}{http://www.unesco.org/languages-atlas/en/atlasmap/language-id-2354.html}. \last{April~27}} \citepos{kornai2013digital} digital vitality index awkwardly declares it to be {\it dead}.\footnote{\href{https://hlt.bme.hu/en/dld/language/5651}{https://hlt.bme.hu/en/dld/language/5651}. \last{April~27}} Naskapi does not appear at all on the Endangered Languages Project. These ratings are displayed in Table~\ref{table:naskapi}.
\begin{table}
\centering
\begin{tabular}{|p{5cm}|p{5cm}|} \hline
{\bf Scale} & {\bf Grade} \\ \hline
UNESCO & Vulnerable \\ \hline
Ethnologue (EGIDS) & 4 (Educational) \\ \hline
LEI & -- \\ \hline
Kornai & Dead \\ \hline
\end{tabular}
\caption{Scale for Naskapi}
\label{table:naskapi}
\end{table}
The {\it dead} terminology used to describe Naskapi by \citepos{kornai2013digital} metric reflects the metric being only applied to online corpora (which are minimal), and, regardless of the insensitivity of the nomenclature, it does have some merit here. When looking at the resources listed, there are no language packs for software, no Wikipedia articles, no Hunspell, no primary texts listed in OLAC, only 2415 words listed in the Cr\'ubad\'an corpus, no Indigenous tweets, no Swadesh lists, and only a brief mention in Panlex translations (90 words), and in Omniglot. Once again the data may not be perfect - this source lists the EGIDS rating at Level 5 (which I would disagree with, placing it in Level 4, as "The language is in vigorous use, with standardisation and literature being sustained through a widespread system of institutionally supported education.")
ODIN has exactly one IGT entry for Naskapi, from \citet{richards2004syntax}. This means that Example~\ref{igt1} may double the size of ODIN's entries, being only the second IGT example for Naskapi (this lexeme is mentioned in \citet{macKenzie1980towards}).\footnote{This might also be transliterated as `wabush`, although it would not match the Naskapi phonological inventory. Wabush is the name of a town in Labrador, which I was told meant `hare' or `rabbit'.}
\begin{exe}
\ex
\gll wa:pus\\
hare\\
\trans `hare'
\label{igt1}
\end{exe}
Regardless of this paucity of data, there are certainly literary resources in Naskapi (see the next section) - if not many digitally. In the case of Naskapi, the Emergent level proposed by \citet{gibson2016assessing} may be more fitting than either Dead or Vital.
\subsubsection{Orthography}
Naskapi has two scripts; Latin and the Unified Canadian Aboriginal Syllabics \citep{wals-141}, which were added to Unicode in 1999.\footnote{\href{https://www.unicode.org/standard/supported.html}{https://www.unicode.org/standard/supported.html}. \last{April~27}} The Syllabics were introduced by missionaries in the 19th century, and quickly adopted by all Cree language communities, who approached near universal literacy \citep{bennett1991cree}. In Kawawachikamach and Schefferville (and on the train there), there are many examples of writing in syllabics. As well, Naskapi has its own standard orthographical conventions for Roman characters. For instance, a macron, such as \^u is used in place of a double \emph{uu} to indicate vowel length.
\citet{jancewicz2002applied} gives an insightful overview of computational technology in Naskapi. They note that Naskapi often were not involved in typesetting literature in syllabics, and that few became typists when the first syllabic typewriters were introduced. Jancewicz is particularly well placed as the author of this paper, as he and his wife were some of the earliest missionaries sent from SIL to the Naskapi community; MacKenzie is also, as she has worked for decades with Cree communities as well as with the Naskapi.(Two prior SIL missionaries had worked on a Naskapi grammar \citep{martens1983practical}.) Jancewicz worked with the Band Office (the local council) installing the first word processing system for syllabics, trained Naskapi speakers, and created the first Naskapi TrueType font.
Jancewicz also helped to install Keyman,\footnote{\href{https://keyman.com/}{https://keyman.com/}. \last{April~27}} "a keyboarding utility ... that allowed the programming of custom keyboard input for various languages and character sets." \citep[85]{jancewicz2002applied} Keyman is now free, open source software available on GitHub.\footnote{\href{https://github.com/keymanapp}{https://github.com/keymanapp}. \last{April~27}} It allows a user to type Roman letters which are converted to the right phrase in Syllabics, and is forgiving for phonemic variants. For instance, "ju", "chu", "tchu" and so on might all be interpreted and replaced by the appropriate syllabic \sylla{co}. % TODO Ask if this is the right syllabic
Keyman must be installed manually on each computer to use it, which reflects a considerable amount of upfront time for Jancewicz. Indeed, the importance of their support to Naskapi digital ascendancy cannot be understated (except, perhaps, by Jancewicz himself):
\begin{quote}
"Since 1988, the resident linguist has maintained all of his own language learning materials and language data on computer. He has also provided the local technical support that is needed in a small, isolated community, especially with regard to the esoteric development of computer programs that allow syllabic word processing. While it is not impossible to use computers in Native language work without a full-time, on-site computer resource person, it has been an obvious asset to have such a person available to provide training and technical support." \citep[86]{jancewicz2002applied}
\end{quote}
Currently, the school has a computer lab with over a dozen computers, but no in-house computer technician. One of the Wycliffe translators needed to visit the school to check on Keyman updates, and the students are not regularly trained in how to set up Keyman on their own, or how to set it up on their phones or other portable devices, although there have been efforts to train key teachers in how to teach computational use of Naskapi \citep{jancewicz1998developing}. While Facebook and other online platforms are increasingly popular, the majority of written conversation takes place in Naskapi written in Roman characters, or in English.
However, it is crucial that development and education regarding computational literacy continue to be mandated and improved. "Using a computer for mother-tongue language work raises speakers' assessment of the worth of their own language, as well as provides an avenue for sharing their work and ideas through reproduction and publication." \citet{jancewicz2002applied}
\citet{jancewicz2002applied} was written before wide adoption of the Unicode standard by browsers, and before the now omnipresent ubiquity of the internet and smartphones. \citet{jancewicz2012cree} gives an update on fonts available for Cree languages, including Naskapi. It also mentions Languagegeek,\footnote{\href{http://www.languagegeek.com/algon/naskapi/naskapi.html}{http://www.languagegeek.com/algon/naskapi/naskapi.html}. \last{May~3}} a website that has useful information on downloading fonts for Naskapi.
\begin{quote}
One of the most important sources for Cree Unicode fonts is the LanguageGeek website by Chris Harvey. Chris Harvey developed "Aboriginal Serif Unicode", which has gone through some changes and improvements. His current strategy is to serve logical regions of syllabic users with fonts that contain subsets of the UCAS block, rather than one font that contains them all. His work is very impressive and professional but some readers may find it difficult to read because of somewhat close letter- and especially word-spacing. \citep[17]{jancewicz2012cree}
\end{quote}
\subsubsection{Corpora creation}
In recent years, the Naskapi Development Council (NDC), which works with translators provided by the Band, has produced a Naskapi to English bilingual dictionary in three volumes \citep{MacKenzie-and-Jancewicz-1994}. The NDC is largely staffed by linguists from the Summer Institute of Linguistics, funded by Wycliffe Bible Translators and private fundraising from Christian communities.\footnote{\href{https://www.wycliffe.org/}{https://www.wycliffe.org/}. \last{April~27}} Today, the SIL linguists are a team of six: two long term linguists, and two pairs of husband and wife pairs who are training how to work as Bible translators in this community before moving on to working with other Cree communities in Canada.
Naskapi does not have a complete Bible. A new testament, started in the 1970's, was recently published \citep{naskapi-new-testament}. Genesis, Exodus, and Psalms have also been translated, and several children's stories and books of oral legends from an elder have been produced - as well, \citet{jancewicz2002applied} note the creation of a monthly newsletter, a history, and translations of official business of the administrations (which may provide excellent multilingual corpora). The full-time translators are two people: a young woman in her mid-twenties, and an older man of around fifty years of age. At times, elders also contribute to the Bible translation effort by marking up pre-publication drafts, which they then go over with the translators.
When there is a need to come up with a new term, the elders are consulted, and they agree on an appropriate translation. For instance, {\it grill} is translated as `metal-net'. `grill' is not a pre\"{e}xisting word in Naskapi, but `net' is, and it is easy to imagine the metaphor of a grill on which you braise meat as being a metal net. However, these decisions are not often used outside of the Bible. Likewise, when there is a term which needs to be invented at the school, the teachers there decide on an appropriate term - for instance, for situations like Halloween, where `Frankenstein' may need to be translated into a local alternative. These decisions are largely one-off, although they may be used year to year, and informally recorded in their respective domains.
The linguists use the Fieldworks Language Explorer (FLEx) \footnote{\href{https://software.sil.org/fieldworks/}{https://software.sil.org/fieldworks/}. \last{April~27}} to document new linguistic terms. FLEx was developed by SIL International, and provides linguists with an out-of-the-box solution for recording linguistic terms using interlinear glossed text. It is also open source, and available on GitHub.\footnote{\href{https://github.com/sillsdev/FieldWorks}{https://github.com/sillsdev/FieldWorks}. \last{April~27}} Users can export the dictionary as a PDF (among other file formats), or export words to an online interface known as Webonary.\footnote{\href{https://www.webonary.org/configuring-the-dictionary-in-flex/}{https://www.webonary.org/configuring-the-dictionary-in-flex/}. \last{April~27}} This allows language workers to automatically create a useable, free dictionary for members of the community.
As well, the Naskapi community has some resources on the Algonquian Linguistic Atlas,\footnote{\href{http://www.atlas-ling.ca/}{http://www.atlas-ling.ca/}. \last{June~16}} mentioned above in Section~\ref{subsec:mapping}, and in the related Algonquian Dictionaries Project.\footnote{\href{https://resources.atlas-ling.ca/}{https://resources.atlas-ling.ca/}. \last{June 16}} The Naskapi information in the Project comes from the NDC and the related linguists working with them (Jancowicz 2018, personal communition). The Project as a whole is funded by several Social Sciences and Humanities Research Council grants, and claims to be open source.\footnote{\href{https://resources.atlas-ling.ca/about/?lang=en}{https://resources.atlas-ling.ca/about/?lang=en}. \last{June 16}} However, I was unable to find the code available for the project on the website; it is possible that communication with the team behind it would be fruitful, but this had not been done yet at the time of this writing.
Naskapi, while far smaller than Gaelic, is judged by UNESCO and others to be less endangered in part because of all of these efforts. The community speak in Naskapi daily in all areas of life, and there is ongoing work to produce more corpora and literacy materials. This year, a fibre-optic cable for high-speed internet is being laid to Kawawachikamach. Hopefully, increasing connection will allow the Naskapi to access resources that they have otherwise been unable to, which may increase digital ascendancy in the language.
\subsection{Summary}
I have looked at Gaelic and Naskapi from the lens of a language researcher interested in finding every open source resource available, with an understanding of the language community. In both cases, I have physically been to the areas where the language is spoken, and noticed literature in the language and seen how it is used on a day-to-day basis. However, linguistic use today does not necessarily lead to a vibrant language tomorrow, and for both Gaelic and Naskapi there is clearly work to be done, especially with applications and websites used by young people, to ensure that the languages digitally ascend. While this is not the only metric to judge survival, it may be a bellwether for the vitality of a language. For both languages, I have heard variations of the phrase "I am proud to be Naskapi, and proud to speak my language". As a clear identity marker, the languages should be preserved. Any work which can be done to expedite this process - for instance, through open source - is welcome.