The Unicode ® Consortium Releases CLDR, Version 1.8
Published: 22 Mar 2010 (, Unicode Consortium)
CLDR 1.8 contains data for 186 languages and 159 territories: 501 locales in all. Version 1.8 of the repository contains over 22% more locale data than the previous release, with over 42,000 new or modified data items from over 300 different contributors.
For this release, the Unicode Consortium partnered with ANLoc, the African Network for Localization, a project sponsored by Canada's International Development Research Centre (IDRC), to help extend modern computing on the African continent. ANLoc's vision is to empower Africans to participate in the digital age by enabling their languages in computers. A sub-project of ANLoc, called Afrigen, focuses on creating African locales.
The Afrigen-ANLoc project's mission is to create viable locale data for at least 100 of the over 2000 languages spoken in Africa, and incorporate the data into Unicode's CLDR project and OpenOffice.org. Implementation of fundamental locale data within CLDR is a critical step for providing computer applications that can be localized into these African languages, thus reaching populations that have never before been able to use their native languages on computers and mobile phones.
The Afrigen-ANLoc project selected approximately 200 candidate languages, including all official languages recognized by a national government and all languages with at least 500,000 native speakers. Additional languages were incorporated when volunteers stepped forward. Data was collected through the Afrigen-ANLoc project by native-speaking volunteers around the world, entered via a web-based utility designed specifically for this purpose, and then merged into the CLDR repository. In all, over 150 volunteers gathered locale data for 72 African languages, with data for 54 of those incorporated into the CLDR 1.8 release. 41 of these languages are completely new to the Unicode CLDR project while 13 others existed in earlier versions of CLDR and were enhanced with additional data. These languages are spoken in 26 countries across the entire African continent.
"The partnership with Afrigen has been a huge benefit for us," says John Emmons, vice-chair of the Unicode CLDR technical committee and lead CLDR engineer for IBM. "The Afrigen effort has allowed us to bring many new languages on board that we wouldn't be able to do through our normal process, while still maintaining the level of quality and consistency that we require for every language."
For more information about Unicode CLDR 1.8, see cldr.unicode.org/index/downloads/cldr-1-8
The Afrigen-ANLoc data collection tool was developed by Louise Berthilson of IT46, and the project is managed by Martin Benjamin, director of Kamusi Project International. For more information about the African Network for Localization, see www.idrc.ca.About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, DENIC eG, Google, Government of India, Government of Tamil Nadu, IBM, Microsoft, Monotype Imaging, Oracle, The Society for Natural Language Technology Research, SAP, Sybase, The University of California (Berkeley), The University of California (Santa Cruz), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium.