Most of the following files are provided in simplified (".gb") format. A traditional version (".b5") may be obtained through the use a conversion program such as iconv.
Name | Contents | Source | Download |
---|---|---|---|
cedict | A Chinese-English dictionary produced through a collaborative internet-based effort. | Here | HTTP |
compphrase | A list of Chinese phrases (mostly > 2 char). | Here | HTTP |
compounds | 2-character compound data: characters, frequency, English definition (originally 'phrases.dat'). | Here | HTTP |
parts | Composition info: character, radical number, remainder. | Here | FTP |
radicals | Radical stroke counts, and "extra" strokes (normally counted as part of residue/remainder). | Here | FTP |
tsi | A list of Chinese characters, words, and phrases with frequency and pronunciation (zhuyin fuhao/bopomofo) format, obtained from the libtabe project 0.2.3 distribution. | Here | HTTP |
zidian | List of character, pinyin, English definition. | Here | FTP |
Here is some minimal background on the encodings themselves.
Encoding | Purpose |
---|---|
Guobiao | Mainland China's official scheme for simplified character encoding |
Big5 | A widely used standard in Taiwan and Hong Kong for traditional character encoding |
Unicode | A two-byte encoding standard for representing most of the world's major writing systems |
UTF-8 | A unix file-system-safe encoding of the same character set as Unicode but using 1-3 or more bytes (all hanzi seem to use 3 bytes) |
Back to main Chinese page.