Characters have names.
I happen to have some experience with text normalization of Western texts, crawled from the World Wide Web, having seen the better part of the zodiac of esoteric symbols. Also, as a traumatic carryover from my relationship with LaTex, I also have some by-product-level of interest in typesetting. One of the recurring topics being the hyphen.
Since poeople have a tendency to assign any word to any meaning (hello arbitrariness of the sign, hello Saussure!), I don't expect anyone to be aware of the nomenclature concerning the horizontal bar-like symbols. Say, when German people read out text loud for others, they say "minus" for the hyphen. But anyone who did LaTeX - and math - knows that minus and hyphen symbols are not interchangeable. You are butchering it. Well, historically, hyphen used to be a replacement for minus in environments where only ASCII was available. Like your grandmom's typewriter. Hence the official name HYPHEN-MINUS
.
Without aiming to be comprehensive, here is a list of hyphen-like symbols.
symbol | unicode | name | notes |
---|---|---|---|
- | 0x002d | plain hyphen | the only one in ASCII plane |
‐ | 0x2010 | unicode hyphen | |
‑ | 0x2011 | non-breaking hyphen | |
– | 0x2012 | figure dash | |
— | 0x2013 | en-dash | as wide as 'n' |
— | 0x2014 | em-dash | as wide as 'm' |
― | 0x2015 | dash or em-dash | as wide as 'm' |
− | 0x2212 | minus | minus is not hyphen |
― | 0x2e12 | CJK minus | minus in Japanese texts |
⸺ | 0x2e14 | double em-dash | twice 'm' |
― | 0x2e15 | horizontal bar | |
﹘ | 0xfe58 | small em dash | |
﹣ | 0xfe63 | small hyphen minus | |
- | 0xff0d | CJK full width hyphen-minus | |
ー | 0xff70 | CJK length mark | half-width |
ー | 0x30fc | CJK full width length mark | 'the' kana length mark |
一 | 0x4e00 | numerical one | |
─ | 0x2500 | box drawings light horizontal | may look different |
━ | 0x2501 | box drawings heavy horizontal | may look different |
Feel free to check the code point for any of these symbols in Python:
hex(ord("―"))
So, who cares?
Normally no one cares. I don't care. But when a site is explicitly asking for a dash in the input, most probably it is not what they want. They want an ASCII hyphen, not a dash. They just don't know. Perhaps dash sounds cooler than hyphen, or easier to spell. I have never come accross any site that accepted dash - code point 0x2013
- in the input.
Yeah, the failed attempt captured in the screenshot was due to the requested dash.