Length of a UTF-8 byte sequence

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:



 Range Length
--------- ------
0x00-0x7F 1
0xC2-0xDF 2
0xE0-0xEF 3
0xF0-0xF4 4


Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.



Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.



This is code golf. The shortest answer in bytes wins.



Test cases



0x00 => 1
0x41 => 1
0x7F => 1
0xC2 => 2
0xDF => 2
0xE0 => 3
0xEF => 3
0xF0 => 4
0xF4 => 4









share|improve this question

























    up vote
    1
    down vote

    favorite












    Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:



     Range Length
    --------- ------
    0x00-0x7F 1
    0xC2-0xDF 2
    0xE0-0xEF 3
    0xF0-0xF4 4


    Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.



    Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.



    This is code golf. The shortest answer in bytes wins.



    Test cases



    0x00 => 1
    0x41 => 1
    0x7F => 1
    0xC2 => 2
    0xDF => 2
    0xE0 => 3
    0xEF => 3
    0xF0 => 4
    0xF4 => 4









    share|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:



       Range Length
      --------- ------
      0x00-0x7F 1
      0xC2-0xDF 2
      0xE0-0xEF 3
      0xF0-0xF4 4


      Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.



      Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.



      This is code golf. The shortest answer in bytes wins.



      Test cases



      0x00 => 1
      0x41 => 1
      0x7F => 1
      0xC2 => 2
      0xDF => 2
      0xE0 => 3
      0xEF => 3
      0xF0 => 4
      0xF4 => 4









      share|improve this question













      Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:



       Range Length
      --------- ------
      0x00-0x7F 1
      0xC2-0xDF 2
      0xE0-0xEF 3
      0xF0-0xF4 4


      Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.



      Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.



      This is code golf. The shortest answer in bytes wins.



      Test cases



      0x00 => 1
      0x41 => 1
      0x7F => 1
      0xC2 => 2
      0xDF => 2
      0xE0 => 3
      0xEF => 3
      0xF0 => 4
      0xF4 => 4






      code-golf integer






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 3 hours ago









      nwellnhof

      3,978715




      3,978715




















          6 Answers
          6






          active

          oldest

          votes

















          up vote
          2
          down vote














          Jelly, 8 7 bytes



          »Ø⁷BaS


          Try it online!






          share|improve this answer





























            up vote
            1
            down vote














            C (gcc), 39 bytes





            t(char x)x=(__builtin_clz(~x)-24)%7u;


            Try it online!






            share|improve this answer



























              up vote
              1
              down vote














              JavaScript (Node.js), 24 bytes





              x=>7^Math.log2(255^x)||1


              Try it online!






              share|improve this answer



























                up vote
                1
                down vote













                Forth, 6 bytes



                x-size


                see https://forth-standard.org/standard/xchar/X-SIZE



                Input and output follows a standard Forth model:



                Input



                Memory address + length (i.e. 1) of a single-byte UTF-8 "string".



                Output



                UTF-8 sequence length in bytes.






                share|improve this answer






















                • I count 5 bytes there.
                  – Erik the Outgolfer
                  12 mins ago

















                up vote
                0
                down vote














                Z80Golf, 19 bytes



                00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
                00000010: 3e01 c9 >..


                Port of user202729's JavaScript answer.



                Example with input 0x41-Try it online!



                Example with input 0xC2-Try it online!



                Example with input 0xE0-Try it online!



                Example with input 0xF4-Try it online!



                Assembly:



                ;input: register a
                ;output: register a
                byte_count: ;calculate 7^(log2(255^a))||1
                xor 0xFF
                ld l,a
                log2:
                ld a,16
                scf
                log2loop:
                adc hl,hl
                dec a
                jr nc,log2loop
                xor 7
                or a
                jr nz, return
                ld a,1
                return:
                ret





                share|improve this answer



























                  up vote
                  0
                  down vote














                  Haskell, 28 bytes





                  f x=sum[1|y<-"Áßï",x>y]+1


                  Try it online!






                  share|improve this answer






















                    Your Answer




                    StackExchange.ifUsing("editor", function ()
                    return StackExchange.using("mathjaxEditing", function ()
                    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
                    );
                    );
                    , "mathjax-editing");

                    StackExchange.ifUsing("editor", function ()
                    StackExchange.using("externalEditor", function ()
                    StackExchange.using("snippets", function ()
                    StackExchange.snippets.init();
                    );
                    );
                    , "code-snippets");

                    StackExchange.ready(function()
                    var channelOptions =
                    tags: "".split(" "),
                    id: "200"
                    ;
                    initTagRenderer("".split(" "), "".split(" "), channelOptions);

                    StackExchange.using("externalEditor", function()
                    // Have to fire editor after snippets, if snippets enabled
                    if (StackExchange.settings.snippets.snippetsEnabled)
                    StackExchange.using("snippets", function()
                    createEditor();
                    );

                    else
                    createEditor();

                    );

                    function createEditor()
                    StackExchange.prepareEditor(
                    heartbeatType: 'answer',
                    convertImagesToLinks: false,
                    noModals: false,
                    showLowRepImageUploadWarning: true,
                    reputationToPostImages: null,
                    bindNavPrevention: true,
                    postfix: "",
                    onDemand: true,
                    discardSelector: ".discard-answer"
                    ,immediatelyShowMarkdownHelp:true
                    );



                    );













                     

                    draft saved


                    draft discarded


















                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodegolf.stackexchange.com%2fquestions%2f173520%2flength-of-a-utf-8-byte-sequence%23new-answer', 'question_page');

                    );

                    Post as a guest






























                    6 Answers
                    6






                    active

                    oldest

                    votes








                    6 Answers
                    6






                    active

                    oldest

                    votes









                    active

                    oldest

                    votes






                    active

                    oldest

                    votes








                    up vote
                    2
                    down vote














                    Jelly, 8 7 bytes



                    »Ø⁷BaS


                    Try it online!






                    share|improve this answer


























                      up vote
                      2
                      down vote














                      Jelly, 8 7 bytes



                      »Ø⁷BaS


                      Try it online!






                      share|improve this answer
























                        up vote
                        2
                        down vote










                        up vote
                        2
                        down vote










                        Jelly, 8 7 bytes



                        »Ø⁷BaS


                        Try it online!






                        share|improve this answer















                        Jelly, 8 7 bytes



                        »Ø⁷BaS


                        Try it online!







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited 58 mins ago

























                        answered 1 hour ago









                        Dennis♦

                        182k32292725




                        182k32292725




















                            up vote
                            1
                            down vote














                            C (gcc), 39 bytes





                            t(char x)x=(__builtin_clz(~x)-24)%7u;


                            Try it online!






                            share|improve this answer
























                              up vote
                              1
                              down vote














                              C (gcc), 39 bytes





                              t(char x)x=(__builtin_clz(~x)-24)%7u;


                              Try it online!






                              share|improve this answer






















                                up vote
                                1
                                down vote










                                up vote
                                1
                                down vote










                                C (gcc), 39 bytes





                                t(char x)x=(__builtin_clz(~x)-24)%7u;


                                Try it online!






                                share|improve this answer













                                C (gcc), 39 bytes





                                t(char x)x=(__builtin_clz(~x)-24)%7u;


                                Try it online!







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered 2 hours ago









                                user202729

                                13k12549




                                13k12549




















                                    up vote
                                    1
                                    down vote














                                    JavaScript (Node.js), 24 bytes





                                    x=>7^Math.log2(255^x)||1


                                    Try it online!






                                    share|improve this answer
























                                      up vote
                                      1
                                      down vote














                                      JavaScript (Node.js), 24 bytes





                                      x=>7^Math.log2(255^x)||1


                                      Try it online!






                                      share|improve this answer






















                                        up vote
                                        1
                                        down vote










                                        up vote
                                        1
                                        down vote










                                        JavaScript (Node.js), 24 bytes





                                        x=>7^Math.log2(255^x)||1


                                        Try it online!






                                        share|improve this answer













                                        JavaScript (Node.js), 24 bytes





                                        x=>7^Math.log2(255^x)||1


                                        Try it online!







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered 2 hours ago









                                        user202729

                                        13k12549




                                        13k12549




















                                            up vote
                                            1
                                            down vote













                                            Forth, 6 bytes



                                            x-size


                                            see https://forth-standard.org/standard/xchar/X-SIZE



                                            Input and output follows a standard Forth model:



                                            Input



                                            Memory address + length (i.e. 1) of a single-byte UTF-8 "string".



                                            Output



                                            UTF-8 sequence length in bytes.






                                            share|improve this answer






















                                            • I count 5 bytes there.
                                              – Erik the Outgolfer
                                              12 mins ago














                                            up vote
                                            1
                                            down vote













                                            Forth, 6 bytes



                                            x-size


                                            see https://forth-standard.org/standard/xchar/X-SIZE



                                            Input and output follows a standard Forth model:



                                            Input



                                            Memory address + length (i.e. 1) of a single-byte UTF-8 "string".



                                            Output



                                            UTF-8 sequence length in bytes.






                                            share|improve this answer






















                                            • I count 5 bytes there.
                                              – Erik the Outgolfer
                                              12 mins ago












                                            up vote
                                            1
                                            down vote










                                            up vote
                                            1
                                            down vote









                                            Forth, 6 bytes



                                            x-size


                                            see https://forth-standard.org/standard/xchar/X-SIZE



                                            Input and output follows a standard Forth model:



                                            Input



                                            Memory address + length (i.e. 1) of a single-byte UTF-8 "string".



                                            Output



                                            UTF-8 sequence length in bytes.






                                            share|improve this answer














                                            Forth, 6 bytes



                                            x-size


                                            see https://forth-standard.org/standard/xchar/X-SIZE



                                            Input and output follows a standard Forth model:



                                            Input



                                            Memory address + length (i.e. 1) of a single-byte UTF-8 "string".



                                            Output



                                            UTF-8 sequence length in bytes.







                                            share|improve this answer














                                            share|improve this answer



                                            share|improve this answer








                                            edited 27 secs ago

























                                            answered 22 mins ago









                                            zeppelin

                                            7,08431338




                                            7,08431338











                                            • I count 5 bytes there.
                                              – Erik the Outgolfer
                                              12 mins ago
















                                            • I count 5 bytes there.
                                              – Erik the Outgolfer
                                              12 mins ago















                                            I count 5 bytes there.
                                            – Erik the Outgolfer
                                            12 mins ago




                                            I count 5 bytes there.
                                            – Erik the Outgolfer
                                            12 mins ago










                                            up vote
                                            0
                                            down vote














                                            Z80Golf, 19 bytes



                                            00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
                                            00000010: 3e01 c9 >..


                                            Port of user202729's JavaScript answer.



                                            Example with input 0x41-Try it online!



                                            Example with input 0xC2-Try it online!



                                            Example with input 0xE0-Try it online!



                                            Example with input 0xF4-Try it online!



                                            Assembly:



                                            ;input: register a
                                            ;output: register a
                                            byte_count: ;calculate 7^(log2(255^a))||1
                                            xor 0xFF
                                            ld l,a
                                            log2:
                                            ld a,16
                                            scf
                                            log2loop:
                                            adc hl,hl
                                            dec a
                                            jr nc,log2loop
                                            xor 7
                                            or a
                                            jr nz, return
                                            ld a,1
                                            return:
                                            ret





                                            share|improve this answer
























                                              up vote
                                              0
                                              down vote














                                              Z80Golf, 19 bytes



                                              00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
                                              00000010: 3e01 c9 >..


                                              Port of user202729's JavaScript answer.



                                              Example with input 0x41-Try it online!



                                              Example with input 0xC2-Try it online!



                                              Example with input 0xE0-Try it online!



                                              Example with input 0xF4-Try it online!



                                              Assembly:



                                              ;input: register a
                                              ;output: register a
                                              byte_count: ;calculate 7^(log2(255^a))||1
                                              xor 0xFF
                                              ld l,a
                                              log2:
                                              ld a,16
                                              scf
                                              log2loop:
                                              adc hl,hl
                                              dec a
                                              jr nc,log2loop
                                              xor 7
                                              or a
                                              jr nz, return
                                              ld a,1
                                              return:
                                              ret





                                              share|improve this answer






















                                                up vote
                                                0
                                                down vote










                                                up vote
                                                0
                                                down vote










                                                Z80Golf, 19 bytes



                                                00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
                                                00000010: 3e01 c9 >..


                                                Port of user202729's JavaScript answer.



                                                Example with input 0x41-Try it online!



                                                Example with input 0xC2-Try it online!



                                                Example with input 0xE0-Try it online!



                                                Example with input 0xF4-Try it online!



                                                Assembly:



                                                ;input: register a
                                                ;output: register a
                                                byte_count: ;calculate 7^(log2(255^a))||1
                                                xor 0xFF
                                                ld l,a
                                                log2:
                                                ld a,16
                                                scf
                                                log2loop:
                                                adc hl,hl
                                                dec a
                                                jr nc,log2loop
                                                xor 7
                                                or a
                                                jr nz, return
                                                ld a,1
                                                return:
                                                ret





                                                share|improve this answer













                                                Z80Golf, 19 bytes



                                                00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
                                                00000010: 3e01 c9 >..


                                                Port of user202729's JavaScript answer.



                                                Example with input 0x41-Try it online!



                                                Example with input 0xC2-Try it online!



                                                Example with input 0xE0-Try it online!



                                                Example with input 0xF4-Try it online!



                                                Assembly:



                                                ;input: register a
                                                ;output: register a
                                                byte_count: ;calculate 7^(log2(255^a))||1
                                                xor 0xFF
                                                ld l,a
                                                log2:
                                                ld a,16
                                                scf
                                                log2loop:
                                                adc hl,hl
                                                dec a
                                                jr nc,log2loop
                                                xor 7
                                                or a
                                                jr nz, return
                                                ld a,1
                                                return:
                                                ret






                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered 1 hour ago









                                                Logern

                                                30115




                                                30115




















                                                    up vote
                                                    0
                                                    down vote














                                                    Haskell, 28 bytes





                                                    f x=sum[1|y<-"Áßï",x>y]+1


                                                    Try it online!






                                                    share|improve this answer


























                                                      up vote
                                                      0
                                                      down vote














                                                      Haskell, 28 bytes





                                                      f x=sum[1|y<-"Áßï",x>y]+1


                                                      Try it online!






                                                      share|improve this answer
























                                                        up vote
                                                        0
                                                        down vote










                                                        up vote
                                                        0
                                                        down vote










                                                        Haskell, 28 bytes





                                                        f x=sum[1|y<-"Áßï",x>y]+1


                                                        Try it online!






                                                        share|improve this answer















                                                        Haskell, 28 bytes





                                                        f x=sum[1|y<-"Áßï",x>y]+1


                                                        Try it online!







                                                        share|improve this answer














                                                        share|improve this answer



                                                        share|improve this answer








                                                        edited 1 hour ago

























                                                        answered 2 hours ago









                                                        BMO

                                                        9,97921774




                                                        9,97921774



























                                                             

                                                            draft saved


                                                            draft discarded















































                                                             


                                                            draft saved


                                                            draft discarded














                                                            StackExchange.ready(
                                                            function ()
                                                            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodegolf.stackexchange.com%2fquestions%2f173520%2flength-of-a-utf-8-byte-sequence%23new-answer', 'question_page');

                                                            );

                                                            Post as a guest













































































                                                            Comments

                                                            Popular posts from this blog

                                                            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                                                            Long meetings (6-7 hours a day): Being “babysat” by supervisor

                                                            Confectionery