Removing characters with sed

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question









New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    5 hours ago










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    4 hours ago







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    4 hours ago










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    3 hours ago






  • 1




    Possible duplicate of Match language range in shell, sed or awk
    – Isaac
    3 hours ago














up vote
2
down vote

favorite












I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question









New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    5 hours ago










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    4 hours ago







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    4 hours ago










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    3 hours ago






  • 1




    Possible duplicate of Match language range in shell, sed or awk
    – Isaac
    3 hours ago












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question









New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.







text-processing sed ksh aix






share|improve this question









New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 5 hours ago





















New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









Auguster

133




133




New contributor




Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Auguster is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    5 hours ago










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    4 hours ago







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    4 hours ago










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    3 hours ago






  • 1




    Possible duplicate of Match language range in shell, sed or awk
    – Isaac
    3 hours ago
















  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    5 hours ago










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    4 hours ago







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    4 hours ago










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    3 hours ago






  • 1




    Possible duplicate of Match language range in shell, sed or awk
    – Isaac
    3 hours ago















à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
– Stéphane Chazelas
5 hours ago




à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
– Stéphane Chazelas
5 hours ago












Possible dublicate unix.stackexchange.com/questions/201751/…
– Goro
4 hours ago





Possible dublicate unix.stackexchange.com/questions/201751/…
– Goro
4 hours ago





1




1




@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
– Auguster
4 hours ago




@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
– Auguster
4 hours ago












To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
– Isaac
3 hours ago




To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
– Isaac
3 hours ago




1




1




Possible duplicate of Match language range in shell, sed or awk
– Isaac
3 hours ago




Possible duplicate of Match language range in shell, sed or awk
– Isaac
3 hours ago










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





share|improve this answer






















  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    5 hours ago






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    5 hours ago






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    5 hours ago






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    2 hours ago







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    2 hours ago


















up vote
1
down vote













If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





share|improve this answer






















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Auguster is a new contributor. Be nice, and check out our Code of Conduct.









     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471405%2fremoving-characters-with-sed%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    You can use the command tr as follows:



    tr -cd '[:print:]trn'


    Explanation:



    `[:print:]'
    Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
    r -- return
    t -- horizontal tab


    Examples based on Centos 7:tris GNU and UTF-8 encoding



    $ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
    fianc

    $ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
    get ^^^^^^

    echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
    Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





    share|improve this answer






















    • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
      – Auguster
      5 hours ago






    • 1




      LC_ALL=C tr ...
      – Jeff Schaller
      5 hours ago






    • 1




      LC_ALL=C tr -cd '[:print:]' < input works here
      – Jeff Schaller
      5 hours ago






    • 1




      echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
      – Stéphane Chazelas
      2 hours ago







    • 1




      Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
      – Stéphane Chazelas
      2 hours ago















    up vote
    3
    down vote



    accepted










    You can use the command tr as follows:



    tr -cd '[:print:]trn'


    Explanation:



    `[:print:]'
    Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
    r -- return
    t -- horizontal tab


    Examples based on Centos 7:tris GNU and UTF-8 encoding



    $ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
    fianc

    $ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
    get ^^^^^^

    echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
    Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





    share|improve this answer






















    • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
      – Auguster
      5 hours ago






    • 1




      LC_ALL=C tr ...
      – Jeff Schaller
      5 hours ago






    • 1




      LC_ALL=C tr -cd '[:print:]' < input works here
      – Jeff Schaller
      5 hours ago






    • 1




      echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
      – Stéphane Chazelas
      2 hours ago







    • 1




      Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
      – Stéphane Chazelas
      2 hours ago













    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    You can use the command tr as follows:



    tr -cd '[:print:]trn'


    Explanation:



    `[:print:]'
    Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
    r -- return
    t -- horizontal tab


    Examples based on Centos 7:tris GNU and UTF-8 encoding



    $ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
    fianc

    $ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
    get ^^^^^^

    echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
    Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





    share|improve this answer














    You can use the command tr as follows:



    tr -cd '[:print:]trn'


    Explanation:



    `[:print:]'
    Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
    r -- return
    t -- horizontal tab


    Examples based on Centos 7:tris GNU and UTF-8 encoding



    $ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
    fianc

    $ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
    get ^^^^^^

    echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
    Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 hours ago

























    answered 5 hours ago









    Goro

    4,56452356




    4,56452356











    • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
      – Auguster
      5 hours ago






    • 1




      LC_ALL=C tr ...
      – Jeff Schaller
      5 hours ago






    • 1




      LC_ALL=C tr -cd '[:print:]' < input works here
      – Jeff Schaller
      5 hours ago






    • 1




      echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
      – Stéphane Chazelas
      2 hours ago







    • 1




      Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
      – Stéphane Chazelas
      2 hours ago

















    • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
      – Auguster
      5 hours ago






    • 1




      LC_ALL=C tr ...
      – Jeff Schaller
      5 hours ago






    • 1




      LC_ALL=C tr -cd '[:print:]' < input works here
      – Jeff Schaller
      5 hours ago






    • 1




      echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
      – Stéphane Chazelas
      2 hours ago







    • 1




      Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
      – Stéphane Chazelas
      2 hours ago
















    That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    5 hours ago




    That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    5 hours ago




    1




    1




    LC_ALL=C tr ...
    – Jeff Schaller
    5 hours ago




    LC_ALL=C tr ...
    – Jeff Schaller
    5 hours ago




    1




    1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    5 hours ago




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    5 hours ago




    1




    1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    2 hours ago





    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    2 hours ago





    1




    1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    2 hours ago





    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    2 hours ago













    up vote
    1
    down vote













    If the current locale already uses UTF-8 as the charset (and file is written using that charset):



    <file LC_ALL=C sed 's/[^ -~]//g'


    Or, to include control characters in AIX sed:



    <file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





    share|improve this answer


























      up vote
      1
      down vote













      If the current locale already uses UTF-8 as the charset (and file is written using that charset):



      <file LC_ALL=C sed 's/[^ -~]//g'


      Or, to include control characters in AIX sed:



      <file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





      share|improve this answer
























        up vote
        1
        down vote










        up vote
        1
        down vote









        If the current locale already uses UTF-8 as the charset (and file is written using that charset):



        <file LC_ALL=C sed 's/[^ -~]//g'


        Or, to include control characters in AIX sed:



        <file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





        share|improve this answer














        If the current locale already uses UTF-8 as the charset (and file is written using that charset):



        <file LC_ALL=C sed 's/[^ -~]//g'


        Or, to include control characters in AIX sed:



        <file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 hours ago









        Stéphane Chazelas

        286k53527866




        286k53527866










        answered 3 hours ago









        Isaac

        7,19111035




        7,19111035




















            Auguster is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            Auguster is a new contributor. Be nice, and check out our Code of Conduct.












            Auguster is a new contributor. Be nice, and check out our Code of Conduct.











            Auguster is a new contributor. Be nice, and check out our Code of Conduct.













             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471405%2fremoving-characters-with-sed%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            Installing NextGIS Connect into QGIS 3?

            Confectionery