Length of a UTF-8 byte sequence
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:
Range Length
--------- ------
0x00-0x7F 1
0xC2-0xDF 2
0xE0-0xEF 3
0xF0-0xF4 4
Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.
Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.
This is code golf. The shortest answer in bytes wins.
Test cases
0x00 => 1
0x41 => 1
0x7F => 1
0xC2 => 2
0xDF => 2
0xE0 => 3
0xEF => 3
0xF0 => 4
0xF4 => 4
code-golf integer
add a comment |Â
up vote
1
down vote
favorite
Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:
Range Length
--------- ------
0x00-0x7F 1
0xC2-0xDF 2
0xE0-0xEF 3
0xF0-0xF4 4
Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.
Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.
This is code golf. The shortest answer in bytes wins.
Test cases
0x00 => 1
0x41 => 1
0x7F => 1
0xC2 => 2
0xDF => 2
0xE0 => 3
0xEF => 3
0xF0 => 4
0xF4 => 4
code-golf integer
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:
Range Length
--------- ------
0x00-0x7F 1
0xC2-0xDF 2
0xE0-0xEF 3
0xF0-0xF4 4
Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.
Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.
This is code golf. The shortest answer in bytes wins.
Test cases
0x00 => 1
0x41 => 1
0x7F => 1
0xC2 => 2
0xDF => 2
0xE0 => 3
0xEF => 3
0xF0 => 4
0xF4 => 4
code-golf integer
Determine the length of a UTF-8 byte sequence given its first byte. The following table shows which ranges map to each possible length:
Range Length
--------- ------
0x00-0x7F 1
0xC2-0xDF 2
0xE0-0xEF 3
0xF0-0xF4 4
Notes on gaps in the table: 0x80-0xBF are continuation bytes, 0xC0-0xC1 would start an overlong, invalid sequence, 0xF5-0xFF would result in a codepoint beyond the Unicode maximum.
Write a program or function that takes the first byte of a UTF-8 byte sequence as input and outputs or returns the length of the sequence. I/O is flexible. For example, the input can be a number, an 8-bit character or a one-character string. You can assume that the first byte is part of valid sequence and falls into one of the ranges above.
This is code golf. The shortest answer in bytes wins.
Test cases
0x00 => 1
0x41 => 1
0x7F => 1
0xC2 => 2
0xDF => 2
0xE0 => 3
0xEF => 3
0xF0 => 4
0xF4 => 4
code-golf integer
code-golf integer
asked 3 hours ago
nwellnhof
3,978715
3,978715
add a comment |Â
add a comment |Â
6 Answers
6
active
oldest
votes
up vote
2
down vote
Jelly, 8 7 bytes
ûÃÂâ·BaS
Try it online!
add a comment |Â
up vote
1
down vote
C (gcc), 39 bytes
t(char x)x=(__builtin_clz(~x)-24)%7u;
Try it online!
add a comment |Â
up vote
1
down vote
JavaScript (Node.js), 24 bytes
x=>7^Math.log2(255^x)||1
Try it online!
add a comment |Â
up vote
1
down vote
Forth, 6 bytes
x-size
see https://forth-standard.org/standard/xchar/X-SIZE
Input and output follows a standard Forth model:
Input
Memory address + length (i.e. 1) of a single-byte UTF-8 "string".
Output
UTF-8 sequence length in bytes.
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
add a comment |Â
up vote
0
down vote
Z80Golf, 19 bytes
00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
00000010: 3e01 c9 >..
Port of user202729's JavaScript answer.
Example with input 0x41-Try it online!
Example with input 0xC2-Try it online!
Example with input 0xE0-Try it online!
Example with input 0xF4-Try it online!
Assembly:
;input: register a
;output: register a
byte_count: ;calculate 7^(log2(255^a))||1
xor 0xFF
ld l,a
log2:
ld a,16
scf
log2loop:
adc hl,hl
dec a
jr nc,log2loop
xor 7
or a
jr nz, return
ld a,1
return:
ret
add a comment |Â
up vote
0
down vote
Haskell, 28 bytes
f x=sum[1|y<-"ÃÂÃÂï",x>y]+1
Try it online!
add a comment |Â
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Jelly, 8 7 bytes
ûÃÂâ·BaS
Try it online!
add a comment |Â
up vote
2
down vote
Jelly, 8 7 bytes
ûÃÂâ·BaS
Try it online!
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Jelly, 8 7 bytes
ûÃÂâ·BaS
Try it online!
Jelly, 8 7 bytes
ûÃÂâ·BaS
Try it online!
edited 58 mins ago
answered 1 hour ago
Dennisâ¦
182k32292725
182k32292725
add a comment |Â
add a comment |Â
up vote
1
down vote
C (gcc), 39 bytes
t(char x)x=(__builtin_clz(~x)-24)%7u;
Try it online!
add a comment |Â
up vote
1
down vote
C (gcc), 39 bytes
t(char x)x=(__builtin_clz(~x)-24)%7u;
Try it online!
add a comment |Â
up vote
1
down vote
up vote
1
down vote
C (gcc), 39 bytes
t(char x)x=(__builtin_clz(~x)-24)%7u;
Try it online!
C (gcc), 39 bytes
t(char x)x=(__builtin_clz(~x)-24)%7u;
Try it online!
answered 2 hours ago
user202729
13k12549
13k12549
add a comment |Â
add a comment |Â
up vote
1
down vote
JavaScript (Node.js), 24 bytes
x=>7^Math.log2(255^x)||1
Try it online!
add a comment |Â
up vote
1
down vote
JavaScript (Node.js), 24 bytes
x=>7^Math.log2(255^x)||1
Try it online!
add a comment |Â
up vote
1
down vote
up vote
1
down vote
JavaScript (Node.js), 24 bytes
x=>7^Math.log2(255^x)||1
Try it online!
JavaScript (Node.js), 24 bytes
x=>7^Math.log2(255^x)||1
Try it online!
answered 2 hours ago
user202729
13k12549
13k12549
add a comment |Â
add a comment |Â
up vote
1
down vote
Forth, 6 bytes
x-size
see https://forth-standard.org/standard/xchar/X-SIZE
Input and output follows a standard Forth model:
Input
Memory address + length (i.e. 1) of a single-byte UTF-8 "string".
Output
UTF-8 sequence length in bytes.
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
add a comment |Â
up vote
1
down vote
Forth, 6 bytes
x-size
see https://forth-standard.org/standard/xchar/X-SIZE
Input and output follows a standard Forth model:
Input
Memory address + length (i.e. 1) of a single-byte UTF-8 "string".
Output
UTF-8 sequence length in bytes.
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Forth, 6 bytes
x-size
see https://forth-standard.org/standard/xchar/X-SIZE
Input and output follows a standard Forth model:
Input
Memory address + length (i.e. 1) of a single-byte UTF-8 "string".
Output
UTF-8 sequence length in bytes.
Forth, 6 bytes
x-size
see https://forth-standard.org/standard/xchar/X-SIZE
Input and output follows a standard Forth model:
Input
Memory address + length (i.e. 1) of a single-byte UTF-8 "string".
Output
UTF-8 sequence length in bytes.
edited 27 secs ago
answered 22 mins ago
zeppelin
7,08431338
7,08431338
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
add a comment |Â
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
I count 5 bytes there.
â Erik the Outgolfer
12 mins ago
add a comment |Â
up vote
0
down vote
Z80Golf, 19 bytes
00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
00000010: 3e01 c9 >..
Port of user202729's JavaScript answer.
Example with input 0x41-Try it online!
Example with input 0xC2-Try it online!
Example with input 0xE0-Try it online!
Example with input 0xF4-Try it online!
Assembly:
;input: register a
;output: register a
byte_count: ;calculate 7^(log2(255^a))||1
xor 0xFF
ld l,a
log2:
ld a,16
scf
log2loop:
adc hl,hl
dec a
jr nc,log2loop
xor 7
or a
jr nz, return
ld a,1
return:
ret
add a comment |Â
up vote
0
down vote
Z80Golf, 19 bytes
00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
00000010: 3e01 c9 >..
Port of user202729's JavaScript answer.
Example with input 0x41-Try it online!
Example with input 0xC2-Try it online!
Example with input 0xE0-Try it online!
Example with input 0xF4-Try it online!
Assembly:
;input: register a
;output: register a
byte_count: ;calculate 7^(log2(255^a))||1
xor 0xFF
ld l,a
log2:
ld a,16
scf
log2loop:
adc hl,hl
dec a
jr nc,log2loop
xor 7
or a
jr nz, return
ld a,1
return:
ret
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Z80Golf, 19 bytes
00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
00000010: 3e01 c9 >..
Port of user202729's JavaScript answer.
Example with input 0x41-Try it online!
Example with input 0xC2-Try it online!
Example with input 0xE0-Try it online!
Example with input 0xF4-Try it online!
Assembly:
;input: register a
;output: register a
byte_count: ;calculate 7^(log2(255^a))||1
xor 0xFF
ld l,a
log2:
ld a,16
scf
log2loop:
adc hl,hl
dec a
jr nc,log2loop
xor 7
or a
jr nz, return
ld a,1
return:
ret
Z80Golf, 19 bytes
00000000: eeff 6f3e 1037 ed6a 3d30 fbee 07b7 2002 ..o>.7.j=0.... .
00000010: 3e01 c9 >..
Port of user202729's JavaScript answer.
Example with input 0x41-Try it online!
Example with input 0xC2-Try it online!
Example with input 0xE0-Try it online!
Example with input 0xF4-Try it online!
Assembly:
;input: register a
;output: register a
byte_count: ;calculate 7^(log2(255^a))||1
xor 0xFF
ld l,a
log2:
ld a,16
scf
log2loop:
adc hl,hl
dec a
jr nc,log2loop
xor 7
or a
jr nz, return
ld a,1
return:
ret
answered 1 hour ago
Logern
30115
30115
add a comment |Â
add a comment |Â
up vote
0
down vote
Haskell, 28 bytes
f x=sum[1|y<-"ÃÂÃÂï",x>y]+1
Try it online!
add a comment |Â
up vote
0
down vote
Haskell, 28 bytes
f x=sum[1|y<-"ÃÂÃÂï",x>y]+1
Try it online!
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Haskell, 28 bytes
f x=sum[1|y<-"ÃÂÃÂï",x>y]+1
Try it online!
Haskell, 28 bytes
f x=sum[1|y<-"ÃÂÃÂï",x>y]+1
Try it online!
edited 1 hour ago
answered 2 hours ago
BMO
9,97921774
9,97921774
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodegolf.stackexchange.com%2fquestions%2f173520%2flength-of-a-utf-8-byte-sequence%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password