User's Guide For Simplified Chinese Input Methods
Preedit Area
Status Area
Lookup Choice Area
Auxiliary Window
III. Basic Functions For Simplified Chinese
Input Methods
1. Opening and Closing Input MethodsIV. Utilities For Simplified Chinese Input Methods
2. Selecting An Input Method
3. Switching Input Methods Roundly
4. Switching Between Half_width Character Mode and Full_width Character Mode
5. Switching Between Chinese Punctuation Mode and English Punctuation Mode
1. Selecting the Utility Menu
2. Input Method Selection Tool
3. Input Method Options Setting Tool
4. Lookup TableLookup table with native encoding5. Virtual Keyboard
Lookup table with UNICODE encoding
Lookup table for special charactersPC Keyboard6. User Define Charater(UDC)
Greek Characters Lookup Keyboard
Russia Characters Lookup Keyboard
ZhuYin Characters Lookup Keyboard
Chinese Punctuation Characters Lookup Keyboard
Number Symbols Lookup Keyboard
Mathmatic Symbols Lookup Keyboard
Table Symbols Lookup Keyboard
Special Symbols Lookup Keyboard
7. Input Method Help
V. Function Specfication for Simplified
Chinese Input Methods
1. ASCII Input Mode
2. New QuanPin and New ShuangPin Input Mode
3. QuanPin Input Mode
4. ShuangPin Input Mode
5. English_Chinese Input Mode
6. NeiMa Input Mode
7. Wubi Input Mode
VI. CodeTable Input
Method Interface
1. Introduction
2. Creating a Codetable
3. Convert the codetable text file to binary format
4. Convert the binary codetable file to text format
5. Creating a new codetable input method
Preedit Area: Highlighted (such as inverse video or underlined) entry
display area.
Status Area: Indicating the current input/conversion mode.
Lookup Choice Area: Displaying multiple character choices.
Auxiliary Windows: Utilities for input method management.
Type [Control+Spacebar] again to close the input methods,
The auxiliary bar disappears.
And then select the input method you want to use.
The input method system is in Full_width Character Mode when the button
appears as below:
The input method system is in Half_width Character Mode when the button
appears as below:
When in Full_width mode, the Full_width character of the input key will be committed to applications.
For example: Inputting 'a' when in Full_width mode, the fullwidth character of 'a' will be committed to application as shown below:
The input method system is in Chinese Punctuation Mode when the button appears as below:
,
and he input method system is in English Punctuation Mode when the
button appears as below:
.
When typing a punctuation key in Chinese Punctuation mode, the corresponding Chinese punctuation character will be committed to application.
For example: when you type "$" in Chinese Punctuation mode, the Simplified
Chinese currency symbol character
will be committed to application as shown below:
The punctuation keys include these characters: , . / <> :;'"\$!^&_-
And the mapping between English and Chinese punctuation is as follows:
The following tools are supported:
And select one of the input method tools from the menu.
Click the input method selection item from the utilities menu, and the input method selection panel appears as below:
After selecting some input methods and clicking "OK" or "Apply", the
setting will be activated. The first input method selected becomes the
default input method.
Press [Control+Spacebar] in the application window to
activate Chinese input. The default input method will be selected as the
current input method.
Press "F2" to switch to the first selected input method,
"F3" to switch to the second selected one, "F4"
to switch to the third selected one, and so on.
With the options setting tool, you can set input method options. After setting the options in this panel, then clicking "OK" or "Apply", the setting is activated.
For input methods based on code table structure, there are 4 options that can be set as described below:
- If this option is selected: Each time a valid key is entered, the input method will search the dictionary table and display the candidates in Lookup window immediately.
- If this option is not selected: Each time a valid key is entered, the input method does not search the dictionary table. It only displays this key in the preedit area, after "Space" key is entered. The input method engine then searches the dictionary table, and displays the candidates.
This option can help user to learn the input method, for example, viewing the external codes of a Chinese character in that input method.
- If this option is selected: In each Lookup window, the external codes for each candidate appears after the candidate.
- If this option is not selected: The external codes for each candidate do not appear after that candidate.
"Automatically commit if only one candidate"
- If this option is selected: If there is only one candidate for the external codes, the input method will automatically commit it.
- If this option is not selected: Input method will display the candidate in Lookup window.
- If this option is selected: When a valid key is entered, the corresponding keyboard mapping character of the key appears in Preedit area.
- If this option is not selected: The keyboard mapping character will not be displayed in Preedit area, only the key.
Three kinds of lookup tables are provided:
In zh/zh_CN/zh_CN.EUC locale, lookup table with GB2312 encoding is provided, and in zh_CN.GBK locale, lookup table with GBK encoding is provided, In zh_CN.GB18030 locale, lookup table with GB18030 encoding is provided.The lookup table panel with GB18030 encoding looks appears as below:
The Simplified Chinese Environment supports the following virtual keyboards:
The virtual keyboards support to input a character by clicking the corresponding buttons on the virtual keyboard. The PC Keyboard appears as below:
- PC Keyboard
- Greek Characters Lookup Keyboard
- Russia Characters Lookup Keyboard
- ZhuYin Characters Lookup Keyboard
- Chinese Punctuation Characters Lookup Keyboard
- Number Symbols Lookup Keyboard
- Mathmatic Symbols Lookup Keyboard
- Table Symbols Lookup Keyboard
- Special Symbols Lookup Keyboard
Click the user define character item from the utilities menu to invoke the UDC tool, which appears as below:
In zh/zh_CN/zh_CN.EUC locales:
Press [Control+spcaebar] to toggle Chinese input conversion
on or off.
Press [Control+Escape] to toggle through Chinese input
modes.
is displayed in the window's status area when in ASCII input mode, when
ASCII input mode is off, ther current conversion mode symbol appears.
PinYin is a popular input method in PRC, and there are various PinYin-based input methods. Two of them, New QuanPin and New ShuangPin, contain the following features:
(1). Defining Phrases for Later Use
The following example shows how to define the phrase "ke lin dun" and store it for later use.
Type the phrase "kelindun" without spaces. The New QuanPin and New ShuangPin input methods will insert spaces for you automatically:
Then type the number representing the first character you want to select. The following example shows the second candidate selected:
Input the second and third characters of the phrase in the same way as above. as below:
Then the new phrase is defined and added to the user dictionary file. The next time you type "ke lin dun", you will see the phrase you defined appears in the lookup choice area:
(2). Selecting Frequently-Used Candidates
The candidates that have been selected will be presented at the beginning of the candidate list so they can be found more readily.
The following example shows how it work:
Type "sh yi". Notice the order of the five available candidates:
Then select the fifth candidate and type "sh yi" again:
Notice that the fifth candidate has moved to the first position because
you previously selected it, which means that
frequently-used candidates are promoted for faster selection.
(3). Inputting Long PinYin Strings
The New QuanPin input methods accepts PinYin strings up to 222 Chinese characters long.
The following example shows how to input a long Chinese phrase:
>>meiguoztongkelindunzhengzaitaolunhaiwanjushiwenti<<
The result is the following Chinese phrase:
(4). Inputting phrase with ShengMu
You can also type ShengMu only to input a Chinese phrase, as shown in the following example:
(5). GBK Support
In zh.GBK/zh_CN.GBK locale, NewQuanPin and New ShuangPin input methods support GBK by default, as shown in the following example:
The second Chinese character
in the phrase
is defined only in the GBK standard.
Single GBK candidates are placed at the end of the list of GB2312 candidates. Press [Return] to scroll to the GBK area. For easier selection next time, you can define the GBK candidate as a phrase (for more information, see Defining Phrases for Later Use). Once a phrase is defined, you can input it easily.
Both New QuanPin and New ShuangPin support GBK Chinese character by default in the zh.GBK/zh_CN.GBK locale. However, because several Chinese character have the same ShengMu (the first part of Pinyin), New QuanPin and New ShuangPin do not display GBK candidates if you provide only the ShengMu.
For example, typing the string "rong " will display GBK
candidates because it is a complete Pinyin string. However,
typing "r" alone will not display any GBK candidates
because it is only a ShengMu.
(6). Keyboard Definition
Key | Definition |
[a-z] | PinYin character |
Home | Moves to the start of the preedit line |
End | Moves to the end of the preedit line |
Left | Moves the caret in the preedit line to the left. If left is Chinese character, the original PinYin is recovered. |
Right | Moves the caret in the preedit line to the right. |
Delete | Deletes the PinYin character following the caret on the preedit line. |
Backspace | Deletes the PinYin character preceding the caret on the preedit line. |
G1 - Highest frequency Hanzi + Long (3 or more) Cizu + Double Chinese Cizu
G2 - GB Single Hanzi
G3 - GBK Single Hanzi (in the zh_CN.GBK locale)
Some Pinyin strings may have more candidates than can be displayed in the same window. In that case, use the keys described in the following table to scroll through the candidates.
Page Scroll Key Definitions
Key | Definition |
- = | Scrolls to previous/next candidate(s) |
[ ] | Scrolls to previous/next candidate(s) |
, . | Scrolls to previous/next candidate(s) |
Return | Quickly scrolls through all candidates |
For example, the Pinyin string [jiang] can be interpreted as [jiang]
or [ji][ang], and both are valid. In New QuanPin,
however, [jiang] is interpreted only as [jiang]. You must use the separator
and enter [ji'ang] for it to be interpreted as
[ji] and [ang]. New ShuangPin does not require the use of separators.
(7). Dictionary Files
New QuanPin and New ShuangPin share two dictionary files: PyCiku.dat
and Ud.Ciku.dat. They are :
/usr/lib/im/locale/zh_CN/data/PyCiku.dat and
/usr/lib/im/locale/zh_CN/data/UdCiku.dat .
Users can not normally write to these files. However, since users can affect the way New QuanPin and New ShuangPin work through features such as frequency adjustment and user-defined phrases, it is necessary to update the dictionary files frequently.
A user's dictionary is normally located in ~/.Xlocale/PyCiku.dat
or ~/.Xlocale/UdCiku.dat (~ indicates the home
directory of the user who starts the htt command). When New QuanPin
and New ShuangPin are started, they locate and read the dictionary files
in the user's home directory. If the user dictionary files are not found,
the system default dictionary files are used (that is, /usr/lib/im/locale/zh_CN/data/...
).
(8). New ShuangPin Features
ShuangPin is an abbreviated form of QuanPin. It is faster but more difficult to use than QuanPin. New ShuangPin supports all of the features, keyboard definitions, and dictionary files of New QuanPin.
There are various ShuangPin keyboard mapping designs in PRC. The most popular three are ZiRanMa, Chinese Star, and Intelligent_ABC. The New ShuangPin input method supports all three of these keyboard mappings.
The following tables contain keyboard mappings for the ZiRanMa keyboards.
Key | Definition |
i | ch |
u | sh |
v | zh |
a | a |
b | ou |
c | iao |
d | uang, iang |
e | e |
f | en |
g | eng |
h | ang |
i | i |
j | an |
k | ao |
l | ai |
m | ian |
n | in |
o | o, uo |
p | un |
q | iu |
r | uan, er |
d | iong, ong |
t | ue |
u | u |
v | v, ui |
w | ua, ia |
x | ie |
y | uai, ing |
z | ei |
A lookup area show the characters that match the QuanPin keystrokes. if more than one character matches the keystroke sequence, you can type a period (.) or [PageDown] key to display the next pages of candidates, and type a comma(,) or [PageUp] key to display the previous page of candidates. You can select a Chinese character you want by typing the corresponding number label key.
This section describes how to use the QuanPin input method to input Chinese characters.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Press F5 to turn on QuanPin input mode, or click the Input method selection button on the auxiliary window and select QuanPin input method. The status area shows that QuanPin input mode is on, as below:
(3). Type zhang.
The QuanPin input converter finds six matching characters and a lookup choice appears as below:
(4). Type number key to select the appropriate character. such as '1' to select the first candidate. the application appears as below:
For example: For Chinese character:
, the QuanPin representation is "zhang", while its ShuangPin
is "vh"
The following tables define the keyboard mappings for the ShuangPin rule.
Key | Definition |
i | ch |
u | sh |
v | zh |
a | a |
b | b |
c | iao |
d | uang, iang |
e | e |
f | en |
g | eng |
h | ang |
i | i |
j | an |
k | ao |
l | ai |
m | ian |
n | in |
o | o, uo |
p | un |
q | iu |
r | uan, er |
s | iong, ong |
t | ue |
u | u |
v | v, ui, ue |
w | ua, ia |
x | ie |
y | uai |
z | ei |
; | ing |
You can use the ShuangPin input method to type individual Chinese characters in zh_CN.EUC,zh_CN.GBK and zh_CN.GB18030 locales.
A lookup area show the characters that match the ShuangPin keystrokes. if more than one character matches the keystroke sequence, you can type a period (.) or [PageDown ] key to display the next pages of candidates, and type a comma(,) or [ PageUp] key to display the previous page of candidates. You can select the Chinese character you want by typing the corresponding number label key.
This section describes how to use the ShuangPin input method to input Chinese characters.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Press F6 to turn on ShuangPin input mode, or click the Input method selection button on the auxiliary window and select ShuangPin input method. The status area shows that ShuangPin input mode is on, as below:
(3). Type vh.
The ShuangPin input converter finds six matching characters and a lookup choice appears as below:
(4). Type a number key to select the appropriate character, such as '1' to select the first candidate. The application appears as below:
If more than one Chinese phrase matches the English word, you can type a period (.) or [PageDown ] key to display the next pages of candidates, and type a comma(,) or [PageUp] key to display the previous page of candidates. You can select the Chinese phrase you want by typing the corresponding number label key.
The following figure shows how to use this input method to type the Chinese phrase representing the Engilsh word "hello". The word requires five keystrokes.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Press F7 to turn on English_Chinese input mode, or click the Input method selection button on the auxiliary window and select English_Chinese input method. The status area shows that English_Chinese input mode is on, as below:
(3).Type hello, as follows:
(4). Type a number key to select the appropriate character, such as ' 1' to select the first candidate. The application appears as below:
(5). Wild characters ( * or ? ) can be used to search in the dictionary, '*' stands for one or several letters, and '?' represents only one letter. For example, to search all English words which end with ' lution ', you can input '*lution ' and the lookup choices appear as shown below:
Or to search all English words which begin with 'c' , and only three letters, you can input 'c?? ' , the lookup choices appears as below:
This section describes how to use the GB2312 internal codes to input Chinese characters and symbols in zh and zh_CN.EUC locale.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Click the Input method selection button on the auxiliary window and select GB2312 NeiMa input method. The status area shows that GB2312 NeiMa input mode is on, as below:
(3). Press the first three of the four keys that represent a character, For example, b0a1, as below:
(4). Type the fourth key '1 '. The character automatically is committed to the application, as below:
This section describes how to use the GBK internal codes to input Chinese characters and symbols in zh.GBK/zh_CN.GBK locale.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Click the Input method selection button on the auxiliary window and select GBK NeiMa input method. the status area shows that GBK NeiMa input mode is on, as below:
(3). Press the first three of the four keys that represent a character, For example, 8141, as below:
(4). Type the fourth key '1 '. the character automatically is committed to the application, as below:
This section describes how to use the GB18030 internal codes to input Chinese characters and symbols in zh_CN.GB18030 and zh_CN.UTF-8 locale.
(1). Open a new Terminal, type [Control+Spacebar ] to turn on Chinese input conversion.
(2). Click Input method selection button on the auxiliary window and select GB18030 NeiMa input method. the status area shows that GB18030 NeiMa input mode is on, as below:
(3). To input a Chinese character with 2 bytes of GB18030 internal code, For example, 0x8141 .
Press the first three keys, as below:
(4). Type the fourth key '1 '. The character automatically is committed to the application, as below:
(5). To input a Chinese character with 4 bytes of GB18030 internal code, For example, 0x8139ef30:
Press the first seven keys, as below:
(6). Type the last key '0', then the character is committed to the application, as below:
Wubi's primary advantage is that user can input a character rapidly since there is scarcely more than one candidates to select for one Wubi code. And because the Wubi input method is based on shape, almost every CJK characters can be encoded by its encoding rule, while it is very difficult for a phonetic based input method.
About the Wubi encoding rule , you can refer to the document: 《Tutorial Book For Standard Wubi》。
Solaris WangMa Wubi input method support the following functions::
Support GB18030 charset. Support Wubi simplified code. Support Wubi mistake compatible code. Support three levels of identified code. Support "z/Z" as help key. Support phrase input and optional professional phrase libraries. Support character/phrase association. Support input method properties setting.
(1). Support GB18030 charset.
GB18030 standard is a new Chinese character encoding standard issued in 2000. It is mandatory, that it is illegal to sell products in China if not conform to this standard.
GB 18030 has the following significant properties:
In additiona, Wubi input method also support GB2312, GBK charset.
Solaris WangMa wubi input method divide GB18030 charset into three levels: GB2312, GBK and GB18030 level, in which GB2312 level include 6763 frequently-used Chinese characters, GBK include 21003 Chinese characters, and GB18030 include 27533.
While inputting, user can switch between these three levels, just like stretch or shorten an antenna.
(2). Support Wubi simplified code.
Some Chinese characters that are used frequently can be inputed by pressing the first one or two or three radical keys and then the space key.
Wubi simplified codes are devided into 3 levels:
+ Level 1: This first level Chinese characters include 25 most frequently used characters, they are:For example: type "di", and then type spacebar, character "耗" will be inputed, whose level_2 simplified code just is "di".
我人有的和主产不为这工要在地一上是中国工以发了民同
User can only press the corresponding radical key and space key to input these Chinese characters.+ Level 2: These second level Chinese characters are frequently used, user input these characters by pressing the first two radical keys and the space key.
+ Level 3: These third level Chinese characters are frequently used, user input these characters by pressing the first three radical keys and the space key.
(3). Support Wubi mistake compatible code.
Each Chinese character has only one Wubi code according to the WuBi rule, but with the user's handwriting custom, some Chinese characters can be encoded with another Wubi codes, we call them mistake compatible code.
For example: For character "长", "tayi" is the correct WuBi code, but "atyi" can also be a wubi code for this character, user can input this character with both of these two wubi codes.
(4). Support three levels of identified code.
With Wubi encoding rule, some Chinese characters has an identified code to distinguish itself from other characters that with similar shape .
For example, according to the Wubi coding rule, "吧" and "邑" have the same code "KC", we can assign an identified code to them to distinguish them, The identified codes are assigned by the shape or the last radical of the character.
Solaris WangMa Wubi input method will support identified codes with three levels:
+ "A" mode: Every characters with no more than 4 byte wubi codes should be inputed with an identified code."A" mode is the default mode.
+ "B" mode: only characters whose shape is left_to_right mode should be inputed with an identified code.
+ "C" mode: all characters should be inputed with no identified code.
For example, when set idetified code mode to "C mode", type
"tkg" and space key, Two Chinese characters: "和", and "程" will be listed,
while in "A mode", only "和" will be selected and committed to application.
(5). Support "z/Z" as help key.
When user do not know the Wubi code of a Chinese character, he can use
"z/Z" as a help key to search this character.
For example: user can use "azzd" to search all characters/phrases
whose Wubi code begin with "a" and end with "d". as below:
(6). Support phrase input and optional professional phrase libraries.
Solaris WangMa Wubi input method support inputting phrase with Wubi codes. Beside the 90000 basic phrases , Wubi input method also provide 11 professional phrase libraries for selection. user can activate one of them according to his professional domain.
The professional phrase libraries as follow ( Every one hava about 3000 - 20000 phrases):
For example: When use choose "Medicine" phrase libray, and type "mino", some medicine phrases will be listed for selection, as below:Tranpotation Computer Economics and Finance Agriculture Medicine Mineralogy Trade Martial Law Gazetteer Idioms
(7). Support character/phrase association.
When user input a Wubi code which represent a character and submit
it to application, then the phrases which begin with this character will
be listed in candidate area for selection.
For example: type "iuxx", and the Chinese character "滋" will be automatically
committed to application, after the character is appeared in application
window, a new candicate window will pop up and the phrases which begin
with this Chinese character will be listed in this candidate window.
as below:
(8). Support input method properties setting.
Solaris WangMa Wubi input method can set the following properties:
For example: Switch between the three level of Chinese charset, as below:
For example: Switch between professional phrase libraries, as below:
For example: Switch between the three levels of identified code, as below:
Here is an example to specify the format of a codetable text file:
A codetable text file contains the following function specific sections:
Each section is briefly described as below:
- [ Description ]
- [ Comment ]
- [ Key_Prompt ]
- [ Function_Key ]
- [ Phrase ]
- [ Single ]
- [ Options ]
This section contains the following entry items:
(1). "Name:", Specify the name of this codetable.
(2). "Encode:", Specify the encoding of this codetable,
can be UTF-8, GB, GB2312, GBK, GB18030, EUC_TW, BIG5, BIG5HK.
(3). "WildChar:" , Specify the wild characters for input codes.
default values are '*' and '?'.
(4). "UsedCodes:" , Specify the valid characters to input.
(5). "MaxCodes:" , Specify the maximum number of input codes
for one items.
This section contains the following entry items:
(1). "PageUp:"
(2). "PageDown:"
(3). "BackSpace:"
(4). "ClearAll:"
Notes: '^' means [ Control ] key, for example: '^N' means '[ Control+N ]' key.
This section contains the following entry items:
(1). "HelpInfo_Mode:" Values: "ON" or "OFF"
(2). "KeyByKey_Mode:" Values: "ON" or "OFF"
(3). "KeyPrompt_Mode:" Values: "ON" or "OFF"
(4). "AutoSelect_Mode:" Values: "ON" or "OFF"
(5). "SelectKey_Mode:" Values: "Number", "Lower" or "Upper"
The format of every line as follow:
keystroke_sequence Characterlist
Notes: "CharacterList " means a list of Chinese characters with no Space seperated.
The format of every line as follow:
keystroke_sequence word1 word2 word3 ...
the tool "txt2bin" is under directory: "/usr/lib/im/locale/zh_CN/common/"
The command syntax is:
# /usr/lib/im/locale/zh_CN/common/txt2bin source_codetable_file
binary_codetable_file
the tool "bin2txt" is under directory: "/usr/lib/im/locale/zh_CN/common/"
The command syntax is:
# /usr/lib/im/locale/zh_CN/common/bin2txt binary_codetable_file
source_codetable_file
Prepare the code table source file to present the new input method according to the format as specified above.
(2). Convert the source codetable file to binary format:
Use the utility tool "txt2bin " to convert the prepared text codetable file to a binary file.
The command syntax is:
# /usr/lib/im/locale/zh_CN/common/txt2bin source_codetable_file binary_codetable_file
(3). Copy the binary codetable file to path " /usr/lib/im/locale/zh_CN/common/data".
(4). Add the codetable infomation into the input method specification file "/usr/lib/im/locale/zh_CN/sysime.cfg ".
(5). Restart the input method server (htt) and relogin to the system
to enable the new input method.
To restart the input method server (htt), you need to run the following
command as root:
# /etc/init.d/IIim stop
# /etc/init.d/IIim start
Then your new input method is ready to use.
For example: To add a new codetable input method named "new_codetable_im":
(1). First create a codetable format file named "new_codetable_im.txt",
(2). Use tool "txt2bin" to convert it to binary file " new_codetable_im.
data",
(3). Then copy it to path "/usr/lib/im/locale/zh_CN/common/data
",
(4). Add the codetable name "new_codetable_im" into the input method
configuration file: "/usr/lib/im/locale/zh_CN/sysime.cfg
".
(5). Restart input method server (htt).