LoginSignup
1
1

More than 5 years have passed since last update.

Sorting Japanese Words in Indexes (3)

Last updated at Posted at 2014-05-29

This article follows Sorting Japanese Words in Indexes (2)

Space

Order Char. Unicode point (Name)
1 Ux0020 (SPACE)

JIS X 4061 says Space class has only one entry above, but thinking of Unicode, we should define the normal form for IDEOGRAPHIC SPACE, so-called ZENKAKU (= fullwidth) SPACE. We will write down the translation table for the very normal form as follows.

Norm. Order Char. Unicode point (Name)
Ux0020 1   Ux0020 (SPACE)
Ux0020 2   Ux3000 (IDEOGRAPHIC SPACE)

Punctuation mark (except for brackets)

Punctuation mark contains the following characters with indicated order.

Order Char. Unicode point (Name)
1 Ux3001 (IDEOGRAPHIC COMMA)
2 Ux3002 (IDEOGRAPHIC FULL STOP)
3 , Ux002C (COMMA)
4 . Ux002E (FULL STOP)
5 · Ux00B7 (MIDDLE DOT)
6 : Ux003A (COLON)
7 ; Ux003B (SEMICOLON)
8 ? Ux0020 (QUESTION MARK)
9 ! Ux0021 (EXCLAMATION MARK)
10 ¯ Ux00AF (overline (MACRON))
11 _ Ux005F (LOW LINE)
12 Ux2014 (EM DASH)
13 Ux2010 (HYPHEN)
14 / Ux002F (SOLIDUS)
15 \ Ux005C (REVERSE SOLIDUS)
16 Ux301C (WAVE DASH)
17 Ux2016 (DOUBLE VERTICAL LINE)
18 | Ux007C (VERTICAL LINE)
19 Ux2026 (HORIZONTAL ELLIPSIS)
20 Ux2025 (TWO DOT LEADER)
Norm. Order Char. Unicode point (Name)
Ux002C (COMMA) 1 , Ux002C (COMMA)
Ux002C (COMMA) 2 UxFF0C (FULLWIDTH COMMA)
Norm. Order Char. Unicode point (Name)
Ux002E (FULL STOP) 1 . Ux002E (FULL STOP)
Ux002E (FULL STOP) 2 UxFF0E (FULLWIDTH FULL STOP)
Norm. Order Char. Unicode point (Name)
Ux00B7 (MIDDLE DOT) 1 · Ux00B7 (MIDDLE DOT)
Ux00B7 (MIDDLE DOT) 2 Ux30FB (KATAKANA MIDDLE DOT)
Norm. Order Char. Unicode point (Name)
Ux003A (COLON) 1 : Ux003A (COLON)
Ux003A (COLON) 2 UxFF1A (FULLWIDTH COLON)
Norm. Order Char. Unicode point (Name)
Ux003B (SEMICOLON) 1 ; Ux003B (SEMICOLON)
Ux003B (SEMICOLON) 2 UxFF1B (FULLWIDTH SEMICOLON)
Norm. Order Char. Unicode point (Name)
Ux0020 (QUESTION MARK) 1 ? Ux0020 (QUESTION MARK)
Ux0020 (QUESTION MARK) 2 UxFF1F (FULLWIDTH QUESTION MARK)
Norm. Order Char. Unicode point (Name)
Ux0021 (EXCLAMATION MARK) 1 ! Ux0021 (EXCLAMATION MARK)
Ux0021 (EXCLAMATION MARK) 2 UxFF01 (FULLWIDTH EXCLAMATION MARK)
Norm. Order Char. Unicode point (Name)
Ux00AF (MACRON) 1 ¯ Ux00AF (overline (MACRON)
Ux00AF (MACRON) 2 UxFFE3 (FULLWIDTH MACRON)
Norm. Order Char. Unicode point (Name)
Ux005F (LOW LINE) 1 _ Ux005F (LOW LINE)
Ux005F (LOW LINE) 2 _ UxFF3F (FULLWIDTH LOW LINE)
Norm. Order Char. Unicode point (Name)
Ux2014 (EM DASH) 1 Ux2014 (EM DASH)
Ux2014 (EM DASH) 2 Ux2015 (HORIZONTAL BAR)
Norm. Order Char. Unicode point (Name)
Ux2010 (HYPHEN) 1 Ux2010 (HYPHEN)
Ux2010 (HYPHEN) 2 Ux002D (HYPHEN-MINUS)
Norm. Order Char. Unicode point (Name)
Ux002F (SOLIDUS) 1 / Ux002F (SOLIDUS)
Ux002F (SOLIDUS) 2 UxFF0F (FULLWIDTH SOLIDUS)
Norm. Order Char. Unicode point (Name)
Ux007C (VERTICAL LINE) 1 | Ux007C (VERTICAL LINE)
Ux007C (VERTICAL LINE) 2 UxFF5C (FULLWIDTH VERTICAL LINE)
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1