Most of the following files are provided in simplified (".gb") format. A traditional version (".b5") may be obtained through the use a conversion program such as iconv.
|cedict||A Chinese-English dictionary produced through a collaborative internet-based effort.||Here||HTTP|
|compphrase||A list of Chinese phrases (mostly > 2 char).||Here||HTTP|
|compounds||2-character compound data: characters, frequency, English definition (originally 'phrases.dat').||Here||HTTP|
|parts||Composition info: character, radical number, remainder.||Here||FTP|
|radicals||Radical stroke counts, and "extra" strokes (normally counted as part of residue/remainder).||Here||FTP|
|tsi||A list of Chinese characters, words, and phrases with frequency and pronunciation (zhuyin fuhao/bopomofo) format, obtained from the libtabe project 0.2.3 distribution.||Here||HTTP|
|zidian||List of character, pinyin, English definition.||Here||FTP|
Here is some minimal background on the encodings themselves.
|Guobiao||Mainland China's official scheme for simplified character encoding|
|Big5||A widely used standard in Taiwan and Hong Kong for traditional character encoding|
|Unicode||A two-byte encoding standard for representing most of the world's major writing systems|
|UTF-8||A unix file-system-safe encoding of the same character set as Unicode but using 1-3 or more bytes (all hanzi seem to use 3 bytes)|
Back to main Chinese page.