Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update unicode mapping tables #440

Merged
merged 9 commits into from
Aug 28, 2021
File renamed without changes.
File renamed without changes.
File renamed without changes.
12 changes: 12 additions & 0 deletions unicode/eastasia/README.TXT
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
EASTASIA:
The CDROM came with CJK cross reference mappings for standards such as KSC5601,
GB2312, JIS0208, etc. to Unicode 2.0.
However, these particular mappings are now obsolete and have been removed as per
this note from Unicode.org:
The entire former contents of this directory are obsolete and have been
moved to the OBSOLETE directory. The latest information may be found
in the Unihan data files in the latest Unicode Character Database.
August 1, 2001.
The current set of mappings are available from
https://unicode.org/Public/UNIDATA/Unihan.zip
The format of these files is given in https://unicode.org/reports/tr38/
114 changes: 88 additions & 26 deletions unicode/iso8859/8859-1.txt
Original file line number Diff line number Diff line change
@@ -1,42 +1,71 @@
# 8859-1.TXT
# Date: 2015-12-02 20:19:00 GMT [KW]
# © 2015 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Name: ISO 8859-1 (1987) to Unicode
# Unicode version: 1.1
# Table version: 0.1
# Name: ISO/IEC 8859-1:1998 to Unicode
# Unicode version: 3.0
# Table version: 2.0
# Table format: Format A
# Date: 16 January 1995
# Authors: Tim Greenwood <[email protected]>
# John H. Jenkins <[email protected]>
#
# Copyright (c) 1991-1995 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
# No claims are made as to fitness for any particular purpose. No
# warranties of any kind are expressed or implied. The recipient
# agrees to determine applicability of information provided. If this
# file has been provided on magnetic media by Unicode, Inc., the sole
# remedy for any claim will be exchange of defective media within 90
# days of receipt.
#
# Recipient is granted the right to make copies in any form for
# internal distribution and to freely use the information supplied
# in the creation of products supporting Unicode. Unicode, Inc.
# specifically excludes the right to re-distribute this file directly
# to third parties or other organizations whether for profit or not.
# Date: 1999 July 27 (header updated: 2015 December 02)
# Authors: Ken Whistler <[email protected]>
#
# General notes:
#
# This table contains the data the Unicode Consortium has on how
# ISO 8859-1 (1987) characters map into Unicode.
# ISO/IEC 8859-1:1998 characters map into Unicode.
#
# Format: Three tab-separated columns
# Column #1 is the ISO 8859-1 code (in hex as 0xXX)
# Column #1 is the ISO/IEC 8859-1 code (in hex as 0xXX)
# Column #2 is the Unicode (in hex as 0xXXXX)
# Column #3 the Unicode name (follows a comment sign, '#')
#
# The entries are in ISO 8859-1 order
# The entries are in ISO/IEC 8859-1 order.
#
# Version history
# 1.0 version: updates 0.1 version by adding mappings for all
# control characters.
# 2.0 version: updates to copyright notice and terms of use; no
# changes to character mappings
#
# Updated versions of this file may be found in:
# http://www.unicode.org/Public/MAPPINGS/
#
# Any comments or problems, contact <[email protected]>
# Any comments or problems, contact us at:
# http://www.unicode.org/reporting.html
#
0x00 0x0000 # NULL
0x01 0x0001 # START OF HEADING
0x02 0x0002 # START OF TEXT
0x03 0x0003 # END OF TEXT
0x04 0x0004 # END OF TRANSMISSION
0x05 0x0005 # ENQUIRY
0x06 0x0006 # ACKNOWLEDGE
0x07 0x0007 # BELL
0x08 0x0008 # BACKSPACE
0x09 0x0009 # HORIZONTAL TABULATION
0x0A 0x000A # LINE FEED
0x0B 0x000B # VERTICAL TABULATION
0x0C 0x000C # FORM FEED
0x0D 0x000D # CARRIAGE RETURN
0x0E 0x000E # SHIFT OUT
0x0F 0x000F # SHIFT IN
0x10 0x0010 # DATA LINK ESCAPE
0x11 0x0011 # DEVICE CONTROL ONE
0x12 0x0012 # DEVICE CONTROL TWO
0x13 0x0013 # DEVICE CONTROL THREE
0x14 0x0014 # DEVICE CONTROL FOUR
0x15 0x0015 # NEGATIVE ACKNOWLEDGE
0x16 0x0016 # SYNCHRONOUS IDLE
0x17 0x0017 # END OF TRANSMISSION BLOCK
0x18 0x0018 # CANCEL
0x19 0x0019 # END OF MEDIUM
0x1A 0x001A # SUBSTITUTE
0x1B 0x001B # ESCAPE
0x1C 0x001C # FILE SEPARATOR
0x1D 0x001D # GROUP SEPARATOR
0x1E 0x001E # RECORD SEPARATOR
0x1F 0x001F # UNIT SEPARATOR
0x20 0x0020 # SPACE
0x21 0x0021 # EXCLAMATION MARK
0x22 0x0022 # QUOTATION MARK
Expand Down Expand Up @@ -132,6 +161,39 @@
0x7C 0x007C # VERTICAL LINE
0x7D 0x007D # RIGHT CURLY BRACKET
0x7E 0x007E # TILDE
0x7F 0x007F # DELETE
0x80 0x0080 # <control>
0x81 0x0081 # <control>
0x82 0x0082 # <control>
0x83 0x0083 # <control>
0x84 0x0084 # <control>
0x85 0x0085 # <control>
0x86 0x0086 # <control>
0x87 0x0087 # <control>
0x88 0x0088 # <control>
0x89 0x0089 # <control>
0x8A 0x008A # <control>
0x8B 0x008B # <control>
0x8C 0x008C # <control>
0x8D 0x008D # <control>
0x8E 0x008E # <control>
0x8F 0x008F # <control>
0x90 0x0090 # <control>
0x91 0x0091 # <control>
0x92 0x0092 # <control>
0x93 0x0093 # <control>
0x94 0x0094 # <control>
0x95 0x0095 # <control>
0x96 0x0096 # <control>
0x97 0x0097 # <control>
0x98 0x0098 # <control>
0x99 0x0099 # <control>
0x9A 0x009A # <control>
0x9B 0x009B # <control>
0x9C 0x009C # <control>
0x9D 0x009D # <control>
0x9E 0x009E # <control>
0x9F 0x009F # <control>
0xA0 0x00A0 # NO-BREAK SPACE
0xA1 0x00A1 # INVERTED EXCLAMATION MARK
0xA2 0x00A2 # CENT SIGN
Expand Down
Loading