From df1da6785f5f8b11e42eec862903ddc811e0955c Mon Sep 17 00:00:00 2001 From: r12a Date: Thu, 15 Dec 2022 19:05:52 +0000 Subject: [PATCH] Apply https://github.com/r12a/scripts/issues/123 --- arabic/arb.css | 16 ++ arabic/arb.html | 690 ++++++++++++++++++++++++++---------------------- 2 files changed, 388 insertions(+), 318 deletions(-) diff --git a/arabic/arb.css b/arabic/arb.css index 15493d15f..24fe6581e 100755 --- a/arabic/arb.css +++ b/arabic/arb.css @@ -29,3 +29,19 @@ .mapItem .charExample span[lang] { font-size: 200%; } @media print { #freeText { font-size: 24px; } } + + + + + +.useBlockExamples .charExample .ex { + font-size: 3.3rem; + line-height: 1.2; + } +.useBlockExamples .charExample.inline .ex { + font-size: 1.4rem; + } + + + + diff --git a/arabic/arb.html b/arabic/arb.html index cb6540b7c..77225e9f9 100755 --- a/arabic/arb.html +++ b/arabic/arb.html @@ -54,7 +54,7 @@

Contents

Updated - 23 November, 2022 + 15 December, 2022

@@ -117,18 +117,26 @@

Usage & history

Basic features

-

The Arabic script is an abjad. This means that in normal use the script represents only consonant and long vowel sounds. This approach is helped by the strong emphasis on consonant patterns in Semitic languages, however the Arabic script is also used for other kinds of language (such as Urdu and Uighur). See the table to the right for a brief overview of the features of Standard Arabic.

-

Arabic text runs right-to-left in horizontal lines, but numbers and embedded Latin text are read left-to-right.

-

The script is cursive, and some basic letter shapes change radically, depending on what they join to. It is also very common for adjacent characters to ligate and to stretch to fill available space. Many of the characters share a common base form, and are distinguished by the number and location of dots or other small diacritics, called i'jam. For example, س ‎ش ‎ݜ ‎ ݰ ‎ݽ ‎ݾ ‎ڛ ‎ښ ‎ڜ ‎ۺ., and some basic letter shapes change radically, depending on what they join to. It is also very common for adjacent characters to ligate and to stretch to fill available space. Many of the characters share a common base form, and are distinguished by the number and location of dots or other small diacritics, called i'jam. For example, س ‎ش ‎ݜ ‎ ݰ ‎ݽ ‎ݾ ‎ڛ ‎ښ ‎ڜ ‎ۺ.

-

There is no case distinction.

+ +

The Arabic script is an abjad. This means that in normal use the script represents only consonant and long vowel sounds. This approach is helped by the strong emphasis on consonant patterns in Semitic languages, however the Arabic script is also used for other kinds of language (such as Urdu and Uighur). See the table to the right for a brief overview of the features of Standard Arabic.

+ +

Arabic text runs right-to-left in horizontal lines, but numbers and embedded Latin text are read left-to-right. ❯ direction

+ +

The script is cursive, and some basic letter shapes change radically, depending on what they join to. It is also very common for adjacent characters to ligate and to stretch to fill available space. Many of the characters share a common base form, and are distinguished by the number and location of dots or other small diacritics, called i'jam. For example, س ‎ش ‎ݜ ‎ ݰ ‎ݽ ‎ݾ ‎ڛ ‎ښ ‎ڜ ‎ۺ. ❯ shaping

+ +

There is no case distinction.

+

Words are separated by spaces (except some very short, usually 1-letter conjunctions and prepositions, which attach to the following word).

-

Modern Standard Arabic has 28 letters in its alphabet, but regularly uses 8 more. Most of those involve representations of the hamza, for which the usage is complicated. This page also lists -3 letters for foriegn sounds -, and 6 others which are used infrequently.

-

The script hides short vowels, however these and other phonetic information can be written where needed using diacritics. There are 3 basic vowel diacritics, but 4 more and 1 letter are occasionally used. Long vowel locations are marked by matres lectionis (consonants indicating vowel locations), which also take diacritics in vowelled text.

-

In vowelled text, there is a diacritic to indicate the absence of a vowel in consonant clusters, and another diacritic to indicate gemination.

-

A mandatory ligature has to be used for combinations of lam + alif.

-

Arabic uses both European and native digits, and has local forms for several of the more common punctuation marks.

+ +

Modern Standard Arabic has 28 letters in its alphabet, but regularly uses 8 more. Most of those involve representations of the hamza, for which the usage is complicated. This page also lists3 letters for foriegn sounds , and 6 others which are used infrequently. ❯ consonants

+ +

A mandatory ligature has to be used for combinations of lam + alif.

+ +

The script hides short vowels, however these and other phonetic information can be written where needed using diacritics. There are 3 basic vowel diacritics, but 4 more and 1 letter are occasionally used. Long vowel locations are marked by matres lectionis (consonants indicating vowel locations), which also take diacritics in vowelled text. ❯ vowels

+ +

In vowelled text, there is a diacritic to indicate the absence of a vowel in consonant clusters, and another diacritic to indicate gemination. ❯ novowel

+ +

Arabic uses both European and native digits, and has local forms for several of the more common punctuation marks. ❯ numbersinline

@@ -160,7 +168,7 @@

Basic consonants

Extended consonants

-
ڤ␣پ␣چ
+
ڤ␣پ␣چ
@@ -174,7 +182,7 @@

Vowels

Other

ء␣آ␣أ␣إ␣ؤ␣ئ␣ة
-
ٱ␣ڢ␣ڧ␣ࢲ␣ـ␣ﷲ␣ﷺ␣ﷻ␣﷽
+
ٱ␣ڢ␣ڧ␣ࢲ␣ـ␣ﷲ␣ﷺ␣ﷻ␣﷽
@@ -190,12 +198,12 @@

Combining marks

Vowels

-
َ␣ُ␣ِ␣ً␣ٌ␣ٍ␣ْ␣ٰ
+
َ␣ُ␣ِ␣ً␣ٌ␣ٍ␣ْ␣ٰ

Other

-
ٓ␣ٔ␣ٕ␣ّ
+
ٓ␣ٔ␣ٕ␣ّ

The first 3 items only occur in decomposed text.

@@ -229,9 +237,9 @@

Punctuation

Show -
٫␣٬␣٪␣؉␣‰␣–␣‐␣،␣؛␣—␣”␣“␣’␣‘␣«␣»␣‹␣›␣…
+
٫␣٬␣٪␣؉␣‰␣–␣،␣؛␣—␣”␣“␣’␣‘␣«␣»␣‹␣›␣…
-
﴾␣﴿␣؍␣٭
+
﴾␣﴿␣؍␣٭

ASCII

@@ -665,6 +673,181 @@

Vowels

+ + +
+

The script hides short vowels, however these and other phonetic information can be written where needed using diacritics. There are 3 basic vowel diacritics, but 4 more and 1 letter are occasionally used. Long vowel locations are marked by matres lectionis (consonants indicating vowel locations), which also take diacritics in vowelled text.

+ +

In vowelled text, there is a diacritic to indicate the absence of a vowel in consonant clusters, and another diacritic to indicate gemination.

+
+ + + + + +
+

Ijam and tashkil

+ +

The Unicode Standard makes an important distinction between ijam and tashkil diacritics, which is particularly relevant for this section about vowels.

+

An ijam is a diacritic in the Arabic script that is considered to be an integral part of a basic letter form: ځ [U+0681 ARABIC LETTER HAH WITH HAMZA ABOVE] is an example of a letter with an ijam that represents the consonant dz in the Pashto orthography. (Additional diacritics are added as combining characters to express vowel sounds, etc.) Unicode encodes letter+ijam combinations as separate, atomic characters, which are never given decompositions in the standard.u,366 Ijam generally take the form of one-, two-, three- or four-dot markings above or below the basic letter skeleton, although other diacritic forms occur, especially in extensions of the Arabic script in Central and South Asia and in Africa.

+

A tashkil (تَشْكِيل) is an Arabic script mark that indicates vocalization of text or other types of phonetic guide that indicate pronunciation. حٔ [U+062D ARABIC LETTER HAH + U+0654 ARABIC HAMZA ABOVE] is an example of a letter plus tashkil combination. Tashkil are separately encoded as combining marks. These include several subtypes: harakat (short vowel marks), tanwin (postnasalized or long vowel marks), shaddah (consonant gemination mark), and sukun (to mark lack of a following vowel). A basic Arabic letter plus any of these types of marks is never encoded as a separate, precomposed character, but must always be represented as a sequence of letter plus combining mark.u,366 Additional marks invented to indicate non-Arabic vowels, used in extensions of the Arabic script, are also encoded as separate combining marks.

+

This distinction between using a character with ijam instead of combining a letter with a tashkil becomes important when choosing which Unicode characters to use because (as can be seen in the examples above) the visual forms can be identical. Using the wrong character can change the meaning of the text, affecting the results of text search, font rendering, text to speech, etc.

+

There are, however, some very common combinations of diacritic and base that can be represented using precomposed characters or decomposed sequences that are canonically equivalent. For those the standard encourages the use of the precomposed form, but the fact that the forms are canonically equivalent removes concerns about changes in meaning.

+
+ + + + + + + + +
+

Matres lectionis

+

+ +
ا␣و␣ي
+ +

In Arabic, the consonants listed just above may indicate the location of a long vowel, eg. قلوب تاريخ They are always visible, whether or not the text shows vowel diacritics.

+ +

These characters, especially ا [U+0627 ARABIC LETTER ALEF], may also be used with a number of other small marks, such as hamza, for particular effects. (see hamza).

+

The letter alef cannot actually represent a consonant sound on it's own (unlike the other two). In most cases it is really only a support for a vowel and/or diacritic, or an indicator of vowel length, but in word final position commonly either represents a short a, eg. أنا or is silent, eg. +رَسْمِيًا +كَتَبُوا +

+
+ + + + + + +
+

Short vowels

+ +

In situations where it is necessary to unambiguously indicate the underlying vowel sounds, short vowels can be expressed using diacritics called harakat, eg. العَرَبِيَّة +

+

However for languages such as Arabic, Persian and Urdu they are typically not used unless there is a particular need to help the reader understand the pronunciation. The previous example would therefore usually be written العربية +

+

On the other hand, when the script is used for some other languages (such as Uighur, Kashmiri, or Hausa), all vowels are shown, as a matter of course. These diacritics are also used in the Quran (though not originally), to reduce ambiguity.

+ + + + + +
+

Basic harakat

+ +

The basic short vowel marks in the Arabic language repertoire are:

+ +
َ␣ُ␣ِ
+ +

Although the phonemic distinctions for Arabic involve only 3 vowel sounds, the phonetic realisation often varies with context. For example, vowel_mappings includes e and o sounds, which can be found in a few foriegn loan words.

+
+ + + + + +
+

Tanwīn

+

Tanwin refers to the secondary set of vowel diacritics with origins in classical arabic, where indefinite nouns, and adjectives were marked by a final n-sound, called تنوين tænwiːn or, in English, 'nunation'. This is indicated by visually doubling the vowel diacritic, but there are Unicode characters for each combination.

+ +
ً␣ٌ␣ٍ
+ +

In modern text this is particularly common for adverbs.jm,51

+

ً [U+064B ARABIC FATHATAN] is often used in the combination ◌ًا [U+064B ARABIC FATHATAN + U+0627 ARABIC LETTER ALEF] , where the ALEF is silent and the ending is pronounced -an, eg. +فَوْرًا +

+

The same applies before TEH MARBUTA, eg. +أَفْعًى +

+

If it appears as ◌َةً [U+064E ARABIC FATHA + U+0629 ARABIC LETTER TEH MARBUTA + U+064B ARABIC FATHATAN] the pronunciation is -atan, eg. +عَادَةً +

+

After a final YEH, the pronunciation has an extra j sound,jm,51 ie. -iːjan, eg. +رَسْمِيًا +

+

In modern arabic printing the fathatan may be dropped, but the alef is retained.

+

The other two diacritics are much less common.jm,51

+
+ + + + + + +
+

Word-initial vowels

+ +

Word-initial vowels use alef as a carrier, eg. +اِسْم +

+ +

Commonly, word-initial vowels are actually preceded by a glottal stop, in which case the vowel diacritic is applied to one of أ [U+0623 ARABIC LETTER ALEF WITH HAMZA ABOVE] or إ [U+0625 ARABIC LETTER ALEF WITH HAMZA BELOW].

+
+
+ + + + + + + + +
+

Superscript alef

+ +
ٰ
+ +

ٰ  [U+0670 ARABIC LETTER SUPERSCRIPT ALEF] is used in only a few Arabic words, however they tend to be commonly used words, eg. +هٰذَا +اللّٰه +

+
+ + + + + + +
+

Alef maksura

+ +
ى
+ +

ى [U+0649 ARABIC LETTER ALEF MAKSURA] represents the long a-vowel at the end of many words when it is written with YEH instead of an ALEF. In this case the YEH is typically printed without dots, to avoid confusion (although an ordinary YEH may also be written sometimes dotless). This spelling only occurs with certain words, and only when the final sound is long, eg. , eg. +معنى

+

If any suffix is added, the spelling reverts to the normal alef, eg. +معناهم mæʕnaː-hum +

+
+ + + + +
+

Diphthongs

+

The 2 diphthongs aj and aw are written using a combination of short a with the semivowels ي [U+064A ARABIC LETTER YEH] and و [U+0648 ARABIC LETTER WAW],wp,#Vowels eg. عَيْن عَوْد

+
+ + + + +
+

Vowel absence

+ +
ْ
+ +

When text is vowelled, ْ   [U+0652 ARABIC SUKUN] can be used over a consonant to indicate that it is not followed by a vowel sound, eg. مَكْتَب

+
+ + + + + + +

Vowel sounds to characters

@@ -909,174 +1092,219 @@

Diphthongs

+ -
+ +
+

Consonants

-
-

Ijam and tashkil

-

The Unicode Standard makes an important distinction between ijam and tashkil diacritics, which is particularly relevant for this section about vowels.

-

An ijam is a diacritic in the Arabic script that is considered to be an integral part of a basic letter form: ځ [U+0681 ARABIC LETTER HAH WITH HAMZA ABOVE] is an example of a letter with an ijam that represents the consonant dz in the Pashto orthography. (Additional diacritics are added as combining characters to express vowel sounds, etc.) Unicode encodes letter+ijam combinations as separate, atomic characters, which are never given decompositions in the standard.u,366 Ijam generally take the form of one-, two-, three- or four-dot markings above or below the basic letter skeleton, although other diacritic forms occur, especially in extensions of the Arabic script in Central and South Asia and in Africa.

-

A tashkil (تَشْكِيل) is an Arabic script mark that indicates vocalization of text or other types of phonetic guide that indicate pronunciation. حٔ [U+062D ARABIC LETTER HAH + U+0654 ARABIC HAMZA ABOVE] is an example of a letter plus tashkil combination. Tashkil are separately encoded as combining marks. These include several subtypes: harakat (short vowel marks), tanwin (postnasalized or long vowel marks), shaddah (consonant gemination mark), and sukun (to mark lack of a following vowel). A basic Arabic letter plus any of these types of marks is never encoded as a separate, precomposed character, but must always be represented as a sequence of letter plus combining mark.u,366 Additional marks invented to indicate non-Arabic vowels, used in extensions of the Arabic script, are also encoded as separate combining marks.

-

This distinction between using a character with ijam instead of combining a letter with a tashkil becomes important when choosing which Unicode characters to use because (as can be seen in the examples above) the visual forms can be identical. Using the wrong character can change the meaning of the text, affecting the results of text search, font rendering, text to speech, etc.

-

There are, however, some very common combinations of diacritic and base that can be represented using precomposed characters or decomposed sequences that are canonically equivalent. For those the standard encourages the use of the precomposed form, but the fact that the forms are canonically equivalent removes concerns about changes in meaning.

-
+
+

Modern Standard Arabic has 28 letters in its alphabet, but regularly uses 8 more. Most of those involve representations of the hamza, for which the usage is complicated. This page also lists3 letters for foriegn sounds , and 6 others which are used infrequently.

+

A mandatory ligature has to be used for combinations of lam + alif.

+
+
+

Basic consonant letters

-
-

Matres lectionis

-

+

The main Unicode Arabic block contains 153 letters, with 77 more in the extended blocks. As shown in the previous section, only a small subset of those are used to write a given language. The others represent special characters added to the repertoire for one or other of the many languages for which the Arabic script is used.

+ + + + +

The vast majority of letters represent consonants. A few represent long vowels.

+

The following letters are those generally recognised as constituting the alphabet for the Standard Arabic language.

-
ا␣و␣ي
-

In Arabic, the consonants listed just above may indicate the location of a long vowel, eg. قلوب تاريخ They are always visible, whether or not the text shows vowel diacritics.

+
ا
-

These characters, especially ا [U+0627 ARABIC LETTER ALEF], may also be used with a number of other small marks, such as hamza, for particular effects. (see hamza).

-

The letter alef cannot actually represent a consonant sound on it's own (unlike the other two). In most cases it is really only a support for a vowel and/or diacritic, or an indicator of vowel length, but in word final position commonly either represents a short a, eg. أنا or is silent, eg. -رَسْمِيًا -كَتَبُوا -

-
+
ب␣ت␣د␣ط␣ض␣ك␣ق
+
ف␣ث␣ذ␣س␣ص␣ز␣ظ␣ش␣ج␣خ␣غ␣ه␣ح␣ع
+
م␣ن
+
و␣ر␣ل␣ي
-
-

Short vowels

-

In situations where it is necessary to unambiguously indicate the underlying vowel sounds, short vowels can be expressed using diacritics called harakat, eg. العَرَبِيَّة -

-

However for languages such as Arabic, Persian and Urdu they are typically not used unless there is a particular need to help the reader understand the pronunciation. The previous example would therefore usually be written العربية -

-

On the other hand, when the script is used for some other languages (such as Uighur, Kashmiri, or Hausa), all vowels are shown, as a matter of course. These diacritics are also used in the Quran (though not originally), to reduce ambiguity.

+

Of those, as mentioned earlier, some letters represent long vowel locations or combinations of consonant plus vowel.

+
-
-

Basic harakat

-

The basic short vowel marks in the Arabic language repertoire are:

+
+

Supplementary letters

-
َ␣ُ␣ِ
+

Other Unicode letters regularly used in Arabic include:

-

Although the phonemic distinctions for Arabic involve only 3 vowel sounds, the phonetic realisation often varies with context. For example, vowel_mappings includes e and o sounds, which can be found in a few foriegn loan words.

-
+
ء␣آ␣أ␣إ␣ؤ␣ئ␣ى␣ة
+

Most of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), however ة [U+0629 ARABIC LETTER TEH MARBUTA] does not.

+
-
-

Tanwīn

-

Tanwin refers to the secondary set of vowel diacritics with origins in classical arabic, where indefinite nouns, and adjectives were marked by a final n-sound, called تنوين tænwiːn or, in English, 'nunation'. This is indicated by visually doubling the vowel diacritic, but there are Unicode characters for each combination.

-
ً␣ٌ␣ٍ
+
+

Alef

-

In modern text this is particularly common for adverbs.jm,51

-

ً [U+064B ARABIC FATHATAN] is often used in the combination ◌ًا [U+064B ARABIC FATHATAN + U+0627 ARABIC LETTER ALEF] , where the ALEF is silent and the ending is pronounced -an, eg. -فَوْرًا -

-

The same applies before TEH MARBUTA, eg. -أَفْعًى -

-

If it appears as ◌َةً [U+064E ARABIC FATHA + U+0629 ARABIC LETTER TEH MARBUTA + U+064B ARABIC FATHATAN] the pronunciation is -atan, eg. -عَادَةً -

-

After a final YEH, the pronunciation has an extra j sound,jm,51 ie. -iːjan, eg. -رَسْمِيًا -

-

In modern arabic printing the fathatan may be dropped, but the alef is retained.

-

The other two diacritics are much less common.jm,51

+

Formally speaking, ا [U+0627 ARABIC LETTER ALEF] has no sound of its own. It is really a vowel lengthener and carrier. Its main uses in arabic orthography are:

+ +

That said, its presence usually indicates the location of a vowel.

+

It also has one or two minor functions such as in conjunction with tawiin (nunation) (see U+064B ARABIC FATHATAN ً ).

+

Certain parts of the arabic verb end in a long u-vowel that is conventionally written with a following alef that has no effect on pronunciation, eg. كتبوا ktbwɑ kætæbuːThe alef is omitted if a suffix is added, eg. كتبوها ktbwhɑ kætæbuː-haa

+
+

Hamza

-
-

Word-initial vowels

+
ء␣أ␣إ␣ؤ␣ئ␣آ
+
ٔ␣ٕ
-

Word-initial vowels use alef as a carrier, eg. -اِسْم -

- -

Commonly, word-initial vowels are actually preceded by a glottal stop, in which case the vowel diacritic is applied to one of أ [U+0623 ARABIC LETTER ALEF WITH HAMZA ABOVE] or إ [U+0625 ARABIC LETTER ALEF WITH HAMZA BELOW].

-
-
+

ء [U+0621 ARABIC LETTER HAMZA] represents the glottal stop sound. For historical reasons, it is treated as an orthographic sign rather than as a letter of the alphabet. It sometimes stands alone, but usually appears with a 'carrier' letter - ALEF, WAW, or YEH for which separate precomposed characters are available in Unicode ( أ إ ؤ ئ ). Examples of use include أنكر نائم بناء

+

In modern printed arabic, the hamza is rarely shown when it occurs at the beginning of a word, but may appear in conjunction with another character. When the hamza is above another character you should typically use ٔ [U+0654 ARABIC HAMZA ABOVE] with the appropriate base character, although there are a number of exceptions, and for the Arabic language all the needed combinations are available as precomposed characters. For more details, see the character description.

+

Classical arabic distinguishes between 'cutting' and 'joining' hamza. 'Cutting' means always pronounced, 'joining' means frequently elided. The joining hamza is of little practical importance in modern arabic pronounced without the old case endings. When it does appear in modern Arabic, ٱ [U+0671 ARABIC LETTER ALEF WASLA] is used to indicate a joining hamza.

+
+

Alef madda

+

آ [U+0622 ARABIC LETTER ALEF WITH MADDA ABOVE] is used when either of the two following combinations of glottal stop and a vowel appear in a word:

+
    +
  • +

    ʔaʔ (hamza, short a, hamza) eg. آثار

    +
  • +
  • +

    ʔaː (hamza, long a) eg. +القرآن +

    +
  • +
+

Normal pronunciation in both cases is ʔaː.

+

The madda sign is still very often shown in print.

+
-
-

Superscript alef

+
+

Teh marbuta

+

ة [U+0629 ARABIC LETTER TEH MARBUTA] usually has no sound, eg. مَدْرَسَة

+

However, it is sometimes pronounced t in specific grammatical contexts.

+

It is used for historical reasons to indicate the feminine ending, a, and is only used in final position. The dots are borrowed from TEH (ت). If any suffix is added, the ending is spelled with ت [U+062A ARABIC LETTER TEH], eg. +مَدْرَسَتْنَا +

+

In modern arabic it is not uncommon to find the two dots omitted, particularly on masculine proper names that have the feminine ending, eg. +طلبه +

+

Vowelled text may omit the short a diacritic before the TEH MARBUTA, because the sound is always the same.

+
-
ٰ
-

ٰ  [U+0670 ARABIC LETTER SUPERSCRIPT ALEF] is used in only a few Arabic words, however they tend to be commonly used words, eg. -هٰذَا -اللّٰه -

+ + +
+

Letters for foreign sounds

+

The following characters are not part of the standard Arabic language set but are occasionally used to represent foreign sounds.

+
ڤ␣پ␣چ
+

Two of the above are borrowed from Persian/Urdu.

+
+

Other letters

+

The following characters also have the general property of Letter, but are less commonly used for modern Arabic language text.

+
ڢ␣ڧ␣ࢲ␣ـ␣ﷲ␣ٱ
+

+ +ڢ +[U+06A2 ARABIC LETTER FEH WITH DOT MOVED BELOW] + -

-

Alef maksura

+and ڧ [U+06A7 ARABIC LETTER QAF WITH DOT ABOVE], are alternative forms that are used in Northwest Africa. [U+08B2 ARABIC LETTER ZAIN WITH INVERTED V ABOVE is used for Berber.

+

ٱ [U+0671 ARABIC LETTER ALEF WASLA] is described in the section hamza. Whereas many of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), this letter does not.

+

[U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM] is a letter from the Arabic precomposed block used to write the name of Allah. The composition of this character differs from font to font in terms of glyph forms. With some fonts it is necessary to add diacritics, whereas with others it is not. 

+

ـ [U+0640 ARABIC TATWEEL] is used to stretch words for simple justification, or to make a word or phrase a particular width, or as a form of emphasis. For more information see justify.

+

ڢ [U+06A2 ARABIC LETTER FEH WITH DOT MOVED BELOW] and ڧ [U+06A7 ARABIC LETTER QAF WITH DOT ABOVE], are alternative forms that are used in Northwest Africa. [U+08B2 ARABIC LETTER ZAIN WITH INVERTED V ABOVE is used for Berber.

+

ٱ [U+0671 ARABIC LETTER ALEF WASLA] is described in the section hamza. Whereas many of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), this letter does not.

+

ـ [U+0640 ARABIC TATWEEL] is used to stretch words for simple justification, or to make a word or phrase a particular width, or as a form of emphasis. For more information see justify.

-
ى
+

Characters in the Arabic Presentation Forms blocks should not normally be used, but they contain just a few characters that are not just for compability use, including the following. (Click on the characters for more details.)

-

ى [U+0649 ARABIC LETTER ALEF MAKSURA] represents the long a-vowel at the end of many words when it is written with YEH instead of an ALEF. In this case the YEH is typically printed without dots, to avoid confusion (although an ordinary YEH may also be written sometimes dotless). This spelling only occurs with certain words, and only when the final sound is long, eg. , eg. -معنى

-

If any suffix is added, the spelling reverts to the normal alef, eg. -معناهم mæʕnaː-hum -

+
ﷲ␣ﷺ␣ﷻ␣﷽
+ +

[U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM] is a letter from the Arabic precomposed block used to write the name of Allah. The composition of this character differs from font to font in terms of glyph forms. With some fonts it is necessary to add diacritics, whereas with others it is not. 

+ -
-

Diphthongs

-

The 2 diphthongs aj and aw are written using a combination of short a with the semivowels ي [U+064A ARABIC LETTER YEH] and و [U+0648 ARABIC LETTER WAW],wp,#Vowels eg. عَيْن عَوْد

+ +
+

Arabic definite article

+

The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article.

+

The lām is not pronounced if it precedes one of the following characters, but instead the following sound is doubled, eg. +السلام علیکم +

+ +
ت␣ث␣د␣ذ␣ر␣ز␣س␣ش␣ص␣ض␣ط␣ظ␣ل␣ن
+ +

These are called 'sun letters' in Arabic. The other letters are 'moon letters'.j,32

+

The alif is also not pronounced if the preceding word ends with a vowel or h. It is, however, written.j,32

-
-

Vowel absence

-
ْ
-

When text is vowelled, ْ   [U+0652 ARABIC SUKUN] can be used over a consonant to indicate that it is not followed by a vowel sound, eg. مَكْتَب

-
+
+

Consonant clusters & gemination

+

The diacritic ّ  [U+0651 ARABIC SHADDA] doubles the value of the consonant it is attached to, which is phonemically significant in Arabic, eg. +تجّار +

+

Like the short vowels, it, too, is not often used, although sometimes it appears when vowel signs don't.

+

When both shadda and kasra are attached to the same base consonant, a common, though not universal, practice is to display the kasra below the shadda, rather than below the base consonant, eg. مُمَثِّلْ Some fonts, such as Amiri, don't do this. (See also gpos.)

@@ -1084,11 +1312,16 @@

Vowel absence

+
+

Ligatures

+

The combination ل + ا [U+0644 ARABIC LETTER LAM + U+0627 ARABIC LETTER ALEF] is always written as a ligature. The underlying code points are, however, preserved. The form of this ligature that joins to the right is ‍لاand unjoined it is لا

+

Observation: When diacritics are used with this ligature, they sometimes appear to be over the ALEF, rather than over the LAM, eg. قليلاً This would require a typing order that is different from the spoken sequence.

+

Other combinations of characters are likely to also ligate (see gsub). The number of ligatures in text typically depends on the font used, but ligation can also be used as a device to manage justification, in which case it needs some degree of manual control

+
+ + - -
-

Consonants

@@ -1364,217 +1597,6 @@

Other

Sources: Wikipedia, and Google Translate.

- - - - - - -
- - - - - -
-

Basic consonant letters

- -

The main Unicode Arabic block contains 153 letters, with 77 more in the extended blocks. As shown in the previous section, only a small subset of those are used to write a given language. The others represent special characters added to the repertoire for one or other of the many languages for which the Arabic script is used.

- - - - -

The vast majority of letters represent consonants. A few represent long vowels.

-

The following letters are those generally recognised as constituting the alphabet for the Standard Arabic language.

- - -
ا
- -
ب␣ت␣د␣ط␣ض␣ك␣ق
- -
ف␣ث␣ذ␣س␣ص␣ز␣ظ␣ش␣ج␣خ␣غ␣ه␣ح␣ع
- -
م␣ن
- -
و␣ر␣ل␣ي
- - - - -

Of those, as mentioned earlier, some letters represent long vowel locations or combinations of consonant plus vowel.

-
- - - - - - -
-

Supplementary letters

- -

Other Unicode letters regularly used in Arabic include:

- -
ء␣آ␣أ␣إ␣ؤ␣ئ␣ى␣ة
- -

Most of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), however ة [U+0629 ARABIC LETTER TEH MARBUTA] does not.

-
- - - - - -
-

Alef

- -

Formally speaking, ا [U+0627 ARABIC LETTER ALEF] has no sound of its own. It is really a vowel lengthener and carrier. Its main uses in arabic orthography are:

- -

That said, its presence usually indicates the location of a vowel.

-

It also has one or two minor functions such as in conjunction with tawiin (nunation) (see U+064B ARABIC FATHATAN ً ).

-

Certain parts of the arabic verb end in a long u-vowel that is conventionally written with a following alef that has no effect on pronunciation, eg. كتبوا ktbwɑ kætæbuːThe alef is omitted if a suffix is added, eg. كتبوها ktbwhɑ kætæbuː-haa

-
- - - - - -
-

Hamza

- -
ء␣أ␣إ␣ؤ␣ئ␣آ
-
ٔ␣ٕ
- -

ء [U+0621 ARABIC LETTER HAMZA] represents the glottal stop sound. For historical reasons, it is treated as an orthographic sign rather than as a letter of the alphabet. It sometimes stands alone, but usually appears with a 'carrier' letter - ALEF, WAW, or YEH for which separate precomposed characters are available in Unicode ( أ إ ؤ ئ ). Examples of use include أنكر نائم بناء

-

In modern printed arabic, the hamza is rarely shown when it occurs at the beginning of a word, but may appear in conjunction with another character. When the hamza is above another character you should typically use ٔ [U+0654 ARABIC HAMZA ABOVE] with the appropriate base character, although there are a number of exceptions, and for the Arabic language all the needed combinations are available as precomposed characters. For more details, see the character description.

-

Classical arabic distinguishes between 'cutting' and 'joining' hamza. 'Cutting' means always pronounced, 'joining' means frequently elided. The joining hamza is of little practical importance in modern arabic pronounced without the old case endings. When it does appear in modern Arabic, ٱ [U+0671 ARABIC LETTER ALEF WASLA] is used to indicate a joining hamza.

- - - - -
-

Alef madda

-

آ [U+0622 ARABIC LETTER ALEF WITH MADDA ABOVE] is used when either of the two following combinations of glottal stop and a vowel appear in a word:

-
    -
  • -

    ʔaʔ (hamza, short a, hamza) eg. آثار

    -
  • -
  • -

    ʔaː (hamza, long a) eg. -القرآن -

    -
  • -
-

Normal pronunciation in both cases is ʔaː.

-

The madda sign is still very often shown in print.

-
- - - - -
-

Teh marbuta

-

ة [U+0629 ARABIC LETTER TEH MARBUTA] usually has no sound, eg. مَدْرَسَة

-

However, it is sometimes pronounced t in specific grammatical contexts.

-

It is used for historical reasons to indicate the feminine ending, a, and is only used in final position. The dots are borrowed from TEH (ت). If any suffix is added, the ending is spelled with ت [U+062A ARABIC LETTER TEH], eg. -مَدْرَسَتْنَا -

-

In modern arabic it is not uncommon to find the two dots omitted, particularly on masculine proper names that have the feminine ending, eg. -طلبه -

-

Vowelled text may omit the short a diacritic before the TEH MARBUTA, because the sound is always the same.

-
- - - - -
-

Letters for foreign sounds

-

The following characters are not part of the standard Arabic language set but are occasionally used to represent foreign sounds.

-
ڤ␣پ␣چ
-

Two of the above are borrowed from Persian/Urdu.

-
- - - - -
-

Other letters

-

The following characters also have the general property of Letter, but are less commonly used for modern Arabic language text.

-
ڢ␣ڧ␣ࢲ␣ـ␣ﷲ␣ٱ
-

- -ڢ -[U+06A2 ARABIC LETTER FEH WITH DOT MOVED BELOW] - - - -and ڧ [U+06A7 ARABIC LETTER QAF WITH DOT ABOVE], are alternative forms that are used in Northwest Africa. [U+08B2 ARABIC LETTER ZAIN WITH INVERTED V ABOVE is used for Berber.

-

ٱ [U+0671 ARABIC LETTER ALEF WASLA] is described in the section hamza. Whereas many of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), this letter does not.

-

[U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM] is a letter from the Arabic precomposed block used to write the name of Allah. The composition of this character differs from font to font in terms of glyph forms. With some fonts it is necessary to add diacritics, whereas with others it is not. 

-

ـ [U+0640 ARABIC TATWEEL] is used to stretch words for simple justification, or to make a word or phrase a particular width, or as a form of emphasis. For more information see justify.

-

ڢ [U+06A2 ARABIC LETTER FEH WITH DOT MOVED BELOW] and ڧ [U+06A7 ARABIC LETTER QAF WITH DOT ABOVE], are alternative forms that are used in Northwest Africa. [U+08B2 ARABIC LETTER ZAIN WITH INVERTED V ABOVE is used for Berber.

-

ٱ [U+0671 ARABIC LETTER ALEF WASLA] is described in the section hamza. Whereas many of the above letters with diacritics decompose in Unicode Normalization Form D (NFD), this letter does not.

-

ـ [U+0640 ARABIC TATWEEL] is used to stretch words for simple justification, or to make a word or phrase a particular width, or as a form of emphasis. For more information see justify.

- -

Characters in the Arabic Presentation Forms blocks should not normally be used, but they contain just a few characters that are not just for compability use, including the following. (Click on the characters for more details.)

- -
ﷲ␣ﷺ␣ﷻ␣﷽
- -

[U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM] is a letter from the Arabic precomposed block used to write the name of Allah. The composition of this character differs from font to font in terms of glyph forms. With some fonts it is necessary to add diacritics, whereas with others it is not. 

-
- - - - - - -
-

Arabic definite article

-

The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article.

-

The lām is not pronounced if it precedes one of the following characters, but instead the following sound is doubled, eg. -السلام علیکم -

- -
ت␣ث␣د␣ذ␣ر␣ز␣س␣ش␣ص␣ض␣ط␣ظ␣ل␣ن
- -

These are called 'sun letters' in Arabic. The other letters are 'moon letters'.j,32

-

The alif is also not pronounced if the preceding word ends with a vowel or h. It is, however, written.j,32

-
- - - - - - -
-

Consonant clusters & gemination

-

The diacritic ّ  [U+0651 ARABIC SHADDA] doubles the value of the consonant it is attached to, which is phonemically significant in Arabic, eg. -تجّار -

-

Like the short vowels, it, too, is not often used, although sometimes it appears when vowel signs don't.

-

When both shadda and kasra are attached to the same base consonant, a common, though not universal, practice is to display the kasra below the shadda, rather than below the base consonant, eg. مُمَثِّلْ Some fonts, such as Amiri, don't do this. (See also gpos.)

-
- - - - - - -
-

Ligatures

-

The combination ل + ا [U+0644 ARABIC LETTER LAM + U+0627 ARABIC LETTER ALEF] is always written as a ligature. The underlying code points are, however, preserved. The form of this ligature that joins to the right is ‍لاand unjoined it is لا

-

Observation: When diacritics are used with this ligature, they sometimes appear to be over the ALEF, rather than over the LAM, eg. قليلاً This would require a typing order that is different from the spoken sequence.

-

Other combinations of characters are likely to also ligate (see gsub). The number of ligatures in text typically depends on the font used, but ligation can also be used as a device to manage justification, in which case it needs some degree of manual control

-
@@ -2447,7 +2469,7 @@

Bracketed text

-
+

خصائصها الفيزيائية (الإشعاعية والحرارية) له أهمية خاصة في أبحاث المناخ

translation

Its physical properties (radiative and thermal) are of particular interest in climate research.

@@ -2533,10 +2555,14 @@

Quotations & citations

Arabic language text uses 2 different sets of quotation marks. Sometimes they are mixed in the same text. The example in fig_quotation_marks uses both in a single sentence.

-
+ + +

 يعني اسم ”سيسيميوت“ «المستعمرة القريبة من أرض بها ثعالب».

A sentence containing 2 types of quotation mark.
+ +

When using bracketing quotation marks, « [U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK] is used at the start, and » [U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK] is used at the end. The shapes are typically rounded, a shown in fig_quotation_marks.

When an additional quote is embedded within the first, the quote marks are [U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK] and [U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK].

The other quote marks are [U+201D RIGHT DOUBLE QUOTATION MARK] at the start, and [U+201C LEFT DOUBLE QUOTATION MARK] at the end.

@@ -2648,8 +2674,36 @@

Line breaking & hyphenation

They are not broken at the small gaps that appear where a character doesn't join on the left.

Like most writing systems, certain characters are expected not to start or end a line. For example, periods and commas shouldn't start a line, and opening parentheses shouldn't end a line.

+ + + + + +
+

Line-edge rules

+ +

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

+

Show default line-breaking properties for characters in the Modern Standard Arabic language.

+

The following list gives examples of typical behaviours for characters used in modern Arabic. Context may affect the behaviour of some of these and other characters.

+ +

Click on the Khmer characters to show what they are.

+ +
    +
  • “ ‘ (   should not be the last character on a line
  • +
  • ” ’ ) ? ! ។ ៕ ៚ %   should not begin a new line
  • +
  •   should be kept with any number, even if separated by a space or parenthesis.
  • +
+ +

The following character should not produce a line-break when they appear inside or alongside a word: .

+ + +
+ +