Characters have names.
I happen to have some experience with text normalization of Western texts, crawled from the World Wide Web, having seen the better part of the zodiac of esoteric symbols. Also, as a traumatic carryover from my relationship with LaTex, I also have some by-product-level of interest in typesetting. One of the recurring topics being the hyphen.
Since poeople have a tendency to assign any word to any meaning (hello arbitrariness of the sign, hello Saussure!), I don't expect anyone to be aware of the nomenclature concerning the horizontal bar-like symbols. Say, when German people read out text loud for others, they say "minus" for the hyphen. But anyone who did LaTeX - and math - knows that minus and hyphen symbols are not interchangeable. You are butchering it. Well, historically, hyphen used to be a replacement for minus in environments where only ASCII was available. Like your grandmom's typewriter. Hence the official name HYPHEN-MINUS
.
Without aiming to be comprehensive, here is a list of hyphen-like symbols.
symbol | unicode | name | notes |
---|---|---|---|
- | 0x002d | hyphen | the only one in ASCII plane |
‑ | 0x2212 | non-breaking hyphen | |
- | 0x2212 | minus | minus is not hyphen |
– | 0x2013 | en-dash | as wide as 'n' |
— | 0x2013 | dash or em-dash | as wide as 'm' |
⸺ | 0x2e14 | double em-dash | twice 'm' |
― | 0x2e15 | horizontal bar | for typographers |
﹘ | 0xfe58 | small em dash | for typographers |
― | 0x2e12 | CJK minus | minus in Japanese texts |
- | 0xff0d | CJK hyphen | |
ー | 0x30fc | CJK length mark | Japanese length mark |
Feel free to check the code point for any of these symbols in Python:
hex(ord("―"))
So, who cares?
Normally no one cares. I don't care. But when a site is explicitly asking for a dash in the input, most probably it is not what they want. They want an ASCII hyphen, not a dash. They just don't know. Perhaps dash sounds cooler than hyphen, or easier to spell. I have never come accross any site that accepted dash - code point 0x2e14
- in the input.
Yeah, the failed attempt in the screenshot was due to the dash.