Why does sort say that ÃÂ = e?
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
ÃÂ
("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e
.
However, if I sort
the following:
eb
ed
ÃÂa
ÃÂc
it seems that sort
considers ÃÂ
and e
equivalent:
ÃÂa
eb
ÃÂc
ed
What's going on here? And is there a way to make ÃÂ
and e
distinct for sort
ing purposes?
sort locale unicode
add a comment |Â
up vote
3
down vote
favorite
ÃÂ
("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e
.
However, if I sort
the following:
eb
ed
ÃÂa
ÃÂc
it seems that sort
considers ÃÂ
and e
equivalent:
ÃÂa
eb
ÃÂc
ed
What's going on here? And is there a way to make ÃÂ
and e
distinct for sort
ing purposes?
sort locale unicode
related What does "LC_ALL=C" do?
â devWeek
3 mins ago
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
ÃÂ
("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e
.
However, if I sort
the following:
eb
ed
ÃÂa
ÃÂc
it seems that sort
considers ÃÂ
and e
equivalent:
ÃÂa
eb
ÃÂc
ed
What's going on here? And is there a way to make ÃÂ
and e
distinct for sort
ing purposes?
sort locale unicode
ÃÂ
("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e
.
However, if I sort
the following:
eb
ed
ÃÂa
ÃÂc
it seems that sort
considers ÃÂ
and e
equivalent:
ÃÂa
eb
ÃÂc
ed
What's going on here? And is there a way to make ÃÂ
and e
distinct for sort
ing purposes?
sort locale unicode
sort locale unicode
edited 19 mins ago
jimmij
29.8k867102
29.8k867102
asked 1 hour ago
Draconis
26318
26318
related What does "LC_ALL=C" do?
â devWeek
3 mins ago
add a comment |Â
related What does "LC_ALL=C" do?
â devWeek
3 mins ago
related What does "LC_ALL=C" do?
â devWeek
3 mins ago
related What does "LC_ALL=C" do?
â devWeek
3 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
3
down vote
man sort:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So, try: LC_ALL=C sort file.txt
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
add a comment |Â
up vote
2
down vote
The character àis not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that â¬uro currency comes close to Europe in dictionary.
Anyway to see what collation you are currently using run locale
, the locale -a
will give you the list of locales available on the system and to change collation say to C
just for one sorting run LC_COLLATE=C sort file
. Finally to see how different locales can sort your file try
for loc in $(locale -a)
do echo ____"$loc"____
LC_COLLATE="$loc" sort file
done
Pipe the result to some greping tool to choose locale that fits your need.
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
man sort:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So, try: LC_ALL=C sort file.txt
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
add a comment |Â
up vote
3
down vote
man sort:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So, try: LC_ALL=C sort file.txt
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
add a comment |Â
up vote
3
down vote
up vote
3
down vote
man sort:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So, try: LC_ALL=C sort file.txt
man sort:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So, try: LC_ALL=C sort file.txt
answered 1 hour ago
Ipor Sircer
9,6941920
9,6941920
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
add a comment |Â
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
â Draconis
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@Draconis What is "the default locale"?
â Kamil Maciorowski
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
â Draconis
1 hour ago
add a comment |Â
up vote
2
down vote
The character àis not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that â¬uro currency comes close to Europe in dictionary.
Anyway to see what collation you are currently using run locale
, the locale -a
will give you the list of locales available on the system and to change collation say to C
just for one sorting run LC_COLLATE=C sort file
. Finally to see how different locales can sort your file try
for loc in $(locale -a)
do echo ____"$loc"____
LC_COLLATE="$loc" sort file
done
Pipe the result to some greping tool to choose locale that fits your need.
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
add a comment |Â
up vote
2
down vote
The character àis not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that â¬uro currency comes close to Europe in dictionary.
Anyway to see what collation you are currently using run locale
, the locale -a
will give you the list of locales available on the system and to change collation say to C
just for one sorting run LC_COLLATE=C sort file
. Finally to see how different locales can sort your file try
for loc in $(locale -a)
do echo ____"$loc"____
LC_COLLATE="$loc" sort file
done
Pipe the result to some greping tool to choose locale that fits your need.
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The character àis not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that â¬uro currency comes close to Europe in dictionary.
Anyway to see what collation you are currently using run locale
, the locale -a
will give you the list of locales available on the system and to change collation say to C
just for one sorting run LC_COLLATE=C sort file
. Finally to see how different locales can sort your file try
for loc in $(locale -a)
do echo ____"$loc"____
LC_COLLATE="$loc" sort file
done
Pipe the result to some greping tool to choose locale that fits your need.
The character àis not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that â¬uro currency comes close to Europe in dictionary.
Anyway to see what collation you are currently using run locale
, the locale -a
will give you the list of locales available on the system and to change collation say to C
just for one sorting run LC_COLLATE=C sort file
. Finally to see how different locales can sort your file try
for loc in $(locale -a)
do echo ____"$loc"____
LC_COLLATE="$loc" sort file
done
Pipe the result to some greping tool to choose locale that fits your need.
answered 21 mins ago
jimmij
29.8k867102
29.8k867102
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
add a comment |Â
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
â Draconis
7 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f477998%2fwhy-does-sort-say-that-%25c9%259b-e%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
related What does "LC_ALL=C" do?
â devWeek
3 mins ago