Welcome to PyICU, a Python extension wrapping IBM's International Components for Unicode C++ library (ICU).
PyICU is a project maintained by the Open Source Applications Foundation.
The PyICU homepage is http://pyicu.osafoundation.org. Its source code is hosted on GitHub at https://github.com/ovalhub/pyicu.
The ICU homepage is http://site.icu-project.org/
Before building PyICU the ICU libraries must be built and installed. Refer to each system's instructions for more information.
PyICU is built with distutils or setuptools:
- verify that the
INCLUDES
,LFLAGS
,CFLAGS
andLIBRARIES
dictionaries insetup.py
contain correct values for your platform python setup.py build
sudo python setup.py install
-
Mac OS X Make sure that
DYLD_LIBRARY_PATH
contains paths to the directory(ies) containing the ICU libs. -
Linux & Solaris Make sure that
LD_LIBRARY_PATH
contains paths to the directory(ies) containing the ICU libs or that you added the corresponding-rpath
argument toLFLAGS
. -
Windows Make sure that
PATH
contains paths to the directory(ies) containing the ICU DLLs.
See the CHANGES
file for an up to date log of changes and additions.
There is no API documentation for PyICU. The API for ICU is documented at http://icu-project.org/apiref/icu4c/ and the following patterns can be used to translate from the C++ APIs to the corresponding Python APIs.
-
strings
The ICU string type,
UnicodeString
, is a type pointing at a mutable array ofUChar
Unicode 16-bit wide characters. The Python unicode type is an immutable string of 16-bit or 32-bit wide Unicode characters.Because of these differences,
UnicodeString
and Python'sunicode
type are not merged into the same type when crossing the C++ boundary. ICU APIs takingUnicodeString
arguments have been overloaded to also accept Python str or unicode type arguments. In the case ofstr
objects,utf-8
encoding is assumed when converting them toUnicodeString
objects.To convert a Python
str
encoded in a encoding other thanutf-8
to an ICUUnicodeString
use theUnicodeString(str, encodingName)
constructor.ICU's C++ APIs accept and return
UnicodeString
arguments in several ways: by value, by pointer or by reference. When an ICU C++ API is documented to accept aUnicodeString
reference parameter, it is safe to assume that there are several corresponding PyICU python APIs making it accessible in simpler ways:For example, the
'UnicodeString &Locale::getDisplayName(UnicodeString &)'
API, documented at http://icu-project.org/apiref/icu4c/classLocale.html can be invoked from Python in several ways:-
The ICU way
from icu import UnicodeString, Locale locale = Locale('pt_BR') string = UnicodeString() name = locale.getDisplayName(string) name <UnicodeString: Portuguese (Brazil)> name is string True <-- string arg was returned, modified in place
-
The Python way
from icu import Locale locale = Locale('pt_BR') name = locale.getDisplayName() name u'Portuguese (Brazil)'
A
UnicodeString
object was allocated and converted to a Pythonunicode
object.
A UnicodeString can be coerced to a Python unicode string with Python's
unicode()
constructor. The usuallen()
,str()
, comparison,[]
and[:]
operators are all available, with the additional twists that slicing is not read-only and that+=
is also available since a UnicodeString is mutable. For example:>>> name = locale.getDisplayName() u'Portuguese (Brazil)' >>> name = UnicodeString(name) >>> name <UnicodeString: Portuguese (Brazil)> >>> unicode(name) u'Portuguese (Brazil)' >>> len(name) 19 >>> str(name) <-- works when chars fit with default encoding 'Portuguese (Brazil)' >>> name[3] u't' >>> name[12:18] <UnicodeString: Brazil> >>> name[12:18] = 'the country of Brasil' >>> name <UnicodeString: Portuguese (the country of Brasil)> >>> name += ' oh joy' >>> name <UnicodeString: Portuguese (the country of Brasil) oh joy>
-
-
error reporting
The C++ ICU library does not use C++ exceptions to report errors. ICU C++ APIs return errors via a
UErrorCode
reference argument. All such APIs are wrapped by Python APIs that omit this argument and throw anICUError
Python exception instead. The same is true for ICU APIs taking both aParseError
and aUErrorCode
, they are both to be omitted.For example, the
'UnicodeString &DateFormat::format(const Formattable &, UnicodeString &, UErrorCode &)'
API, documented at http://icu-project.org/apiref/icu4c/classDateFormat.html is invoked from Python with:>>> from icu import DateFormat, Formattable >>> df = DateFormat.createInstance() >>> df <SimpleDateFormat: M/d/yy h:mm a> >>> f = Formattable(940284258.0, Formattable.kIsDate) >>> df.format(f) u'10/18/99 3:04 PM'
Of course, the simpler
'UnicodeString &DateFormat::format(UDate, UnicodeString &)'
documented here: http://icu-project.org/apiref/icu4c/classDateFormat.html can be used too:>>> from icu import DateFormat >>> df = DateFormat.createInstance() >>> df <SimpleDateFormat: M/d/yy h:mm a> >>> df.format(940284258.0) u'10/18/99 3:04 PM'
-
dates
ICU uses a double floating point type called
UDate
that represents the number of milliseconds elapsed since 1970-jan-01 UTC for dates.In Python, the value returned by the
time
module'stime()
function is the number of seconds since 1970-jan-01 UTC. Because of this difference, floating point values are multiplied by 1000 when passed to APIs takingUDate
and divided by 1000 when returned asUDate
.Python's
datetime
objects, with or without timezone information, can also be used with APIs takingUDate
arguments. Thedatetime
objects get converted toUDate
when crossing into the C++ layer. -
arrays
Many ICU API take array arguments. A list of elements of the array element types is to be passed from Python.
-
StringEnumeration
An ICU
StringEnumeration
has threenext
methods:next()
which returns astr
objects,unext()
which returnsunicode
objects andsnext()
which returnsUnicodeString
objects. Any of these methods can be used as an iterator, using the Python built-initer
function.For example, let
e
be aStringEnumeration
instance::[s for s in e] is a list of 'str' objects [s for s in iter(e.unext, None)] is a list of 'unicode' objects [s for s in iter(e.snext, None)] is a list of 'UnicodeString' objects
-
timezones
The ICU
TimeZone
type may be wrapped with anICUtzinfo
type for usage with Python'sdatetime
type. For example::tz = ICUtzinfo(TimeZone.createTimeZone('US/Mountain')) datetime.now(tz)
or, even simpler::
tz = ICUtzinfo.getInstance('Pacific/Fiji') datetime.now(tz)
To get the default time zone use::
defaultTZ = ICUtzinfo.getDefault()
To get the time zone's id, use the
tzid
attribute or coerce the time zone to a string::ICUtzinfo.getInstance('Pacific/Fiji').tzid -> 'Pacific/Fiji' str(ICUtzinfo.getInstance('Pacific/Fiji')) -> 'Pacific/Fiji'