C Language: union type

Clash Royale CLAN TAG#URR8PPP

up vote
6
down vote

favorite

When I read iso/iec 9899:1999 (see:6.5.2.3), I saw an example like this:

enter image description here

I found no errors and warnings when I tested.

My question is: Why is this fragment invalid?

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

2

f can assume that p1 != p2 because they point to different types. and with optimization - read p1->m value in register and return this register. it assume that p2->m = -p2->m not modify p1->m what is wrong. union here only way make p1==p2
â€“Â RbMm
1 hour ago

add a commentÂ |Â

up vote
6
down vote

favorite

When I read iso/iec 9899:1999 (see:6.5.2.3), I saw an example like this:

enter image description here

I found no errors and warnings when I tested.

My question is: Why is this fragment invalid?

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

2

f can assume that p1 != p2 because they point to different types. and with optimization - read p1->m value in register and return this register. it assume that p2->m = -p2->m not modify p1->m what is wrong. union here only way make p1==p2
â€“Â RbMm
1 hour ago

add a commentÂ |Â

up vote
6
down vote

favorite

When I read iso/iec 9899:1999 (see:6.5.2.3), I saw an example like this:

enter image description here

I found no errors and warnings when I tested.

My question is: Why is this fragment invalid?

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

When I read iso/iec 9899:1999 (see:6.5.2.3), I saw an example like this:

enter image description here

I found no errors and warnings when I tested.

My question is: Why is this fragment invalid?

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

edited 1 hour ago

StoryTeller

83.7k12168234

edited 1 hour ago

StoryTeller

83.7k12168234

edited 1 hour ago

StoryTeller

83.7k12168234

asked 2 hours ago

StrayedKing

645

asked 2 hours ago

StrayedKing

645

asked 2 hours ago

StrayedKing

645

2

f can assume that p1 != p2 because they point to different types. and with optimization - read p1->m value in register and return this register. it assume that p2->m = -p2->m not modify p1->m what is wrong. union here only way make p1==p2
â€“Â RbMm
1 hour ago

add a commentÂ |Â

2

f can assume that p1 != p2 because they point to different types. and with optimization - read p1->m value in register and return this register. it assume that p2->m = -p2->m not modify p1->m what is wrong. union here only way make p1==p2
â€“Â RbMm
1 hour ago

f can assume that p1 != p2 because they point to different types. and with optimization - read p1->m value in register and return this register. it assume that p2->m = -p2->m not modify p1->m what is wrong. union here only way make p1==p2
â€“Â RbMm
1 hour ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
7
down vote

The example attempts to illustrate the paragraph beforehand¹ (emphasis mine):

6.5.2.3 Ã‚Â¶6

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.

Since f is declared before g, and furthermore the unnamed union type is local to g, there is no questioning the union type isn't visible in f.

The example doesn't show how u is initialized, but assuming the last written to member is u.s2.m, the function has undefined behavior because it inspects p1->m without the common initial sequence guarantee being in effect.

Same goes the other way, if it's u.s1.m that was last written to before the function call, than accessing p2->m is undefined behavior.

Note that f itself is not invalid. It's a perfectly reasonable function definition. The undefined behavior stems from passing into it &u.s1 and &u.s2 as arguments. That is what's causing undefined behavior.

_{¹ - I'm quoting n1570, the C11 standard draft. But the specification should be the same, subject only to moving a paragraph or two up/down.}

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

1

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

add a commentÂ |Â

up vote
2
down vote

Here is the strict aliasing rule in action: one assumption made by the C (or C++) compiler, is that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)

This function

int f(struct t1* p1, struct t2* p2);

assumes that p1 != p2 because they formally point to different types. As a result the optimizatier may assume that p2->m = -p2->m; have no effect on p1->m; it can first read the value of p1->m to a register, compare it with 0, if it compare less than 0, then do p2->m = -p2->m; and finally return the register value unchanged!

The union here is the only way to make p1 == p2 on binary level because all union member have the same address.

Another example:

struct t1 int m; ;
struct t2 int m; ;

int f(struct t1* p1, struct t2* p2)

 if (p1->m < 0) p2->m = -p2->m;
 return p1->m;


int g()

 union 
 struct t1 s1;
 struct t2 s2;
 u;
 u.s1.m = -1;
 return f(&u.s1, &u.s2);

What must g return? +1 according to common sense (we change -1 to +1 in f). But if we look at gcc's generate assembly with -O1 optimization

f:
 cmp DWORD PTR [rdi], 0
 js .L3
.L2:
 mov eax, DWORD PTR [rdi]
 ret
.L3:
 neg DWORD PTR [rsi]
 jmp .L2
g:
 mov eax, 1
 ret

So far all is as excepted. But when we try it with -O2

f:
 mov eax, DWORD PTR [rdi]
 test eax, eax
 js .L4
 ret
.L4:
 neg DWORD PTR [rsi]
 ret
g:
 mov eax, -1
 ret

The return value is now a hardcoded -1

This is because f at the beginning caches the value of p1->m in the eax register (mov eax, DWORD PTR [rdi]) and does not reread it after p2->m = -p2->m; (neg DWORD PTR [rsi]) - it returns eax unchanged.

union here used only for
all non-static data members of a union object have the same address. as result &u.s1 == &u.s2.

is somebody not understand assembler code, can show in c/c++ how strict aliasing affect f code:

int f(struct t1* p1, struct t2* p2)

 int a = p1->m;
 if (a < 0) p2->m = -p2->m;
 return a;

compiler cache p1->m value in local var a (actually in register of course) and return it , despite p2->m = -p2->m; change p1->m. but compiler assume that p1 memory not affected, because it assume that p2 point to another memory which not overlap with p1

so with different compilers and different optimization level the same source code can return different values (-1 or +1). so and undefined behavior as is

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52511896%2fc-language-union-type%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
7
down vote

The example attempts to illustrate the paragraph beforehand¹ (emphasis mine):

6.5.2.3 Ã‚Â¶6

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.

Since f is declared before g, and furthermore the unnamed union type is local to g, there is no questioning the union type isn't visible in f.

Same goes the other way, if it's u.s1.m that was last written to before the function call, than accessing p2->m is undefined behavior.

_{¹ - I'm quoting n1570, the C11 standard draft. But the specification should be the same, subject only to moving a paragraph or two up/down.}

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

1

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

add a commentÂ |Â

up vote
7
down vote

The example attempts to illustrate the paragraph beforehand¹ (emphasis mine):

6.5.2.3 Ã‚Â¶6

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.

Since f is declared before g, and furthermore the unnamed union type is local to g, there is no questioning the union type isn't visible in f.

Same goes the other way, if it's u.s1.m that was last written to before the function call, than accessing p2->m is undefined behavior.

_{¹ - I'm quoting n1570, the C11 standard draft. But the specification should be the same, subject only to moving a paragraph or two up/down.}

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

1

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

add a commentÂ |Â

up vote
7
down vote

The example attempts to illustrate the paragraph beforehand¹ (emphasis mine):

6.5.2.3 Ã‚Â¶6

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.

Since f is declared before g, and furthermore the unnamed union type is local to g, there is no questioning the union type isn't visible in f.

Same goes the other way, if it's u.s1.m that was last written to before the function call, than accessing p2->m is undefined behavior.

_{¹ - I'm quoting n1570, the C11 standard draft. But the specification should be the same, subject only to moving a paragraph or two up/down.}

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

The example attempts to illustrate the paragraph beforehand¹ (emphasis mine):

6.5.2.3 Ã‚Â¶6

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.

Since f is declared before g, and furthermore the unnamed union type is local to g, there is no questioning the union type isn't visible in f.

Same goes the other way, if it's u.s1.m that was last written to before the function call, than accessing p2->m is undefined behavior.

_{¹ - I'm quoting n1570, the C11 standard draft. But the specification should be the same, subject only to moving a paragraph or two up/down.}

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

edited 1 hour ago

answered 2 hours ago

StoryTeller

83.7k12168234

answered 2 hours ago

StoryTeller

83.7k12168234

answered 2 hours ago

StoryTeller

83.7k12168234

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

1

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

add a commentÂ |Â

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

1

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

So would changing f to take int* and passing in &u.s1.m and u.s2.m make it valid? Because then itÃ¢Â€Â™s g doing the struct accessing.
â€“Â Zastai
1 hour ago

@Zastai - You know, I'm not sure. I think that's a good question that deserves to stand on its own. Post it with the the language-lawyer tag. It should be interesting.
â€“Â StoryTeller
1 hour ago

@Zastai - Wait, actually you are right. Since those pointers are of the same type, they may alias the same object. It's the whole strict aliasing spiel.
â€“Â StoryTeller
1 hour ago

add a commentÂ |Â

up vote
2
down vote

This function

int f(struct t1* p1, struct t2* p2);

The union here is the only way to make p1 == p2 on binary level because all union member have the same address.

Another example:

struct t1 int m; ;
struct t2 int m; ;

int f(struct t1* p1, struct t2* p2)

 if (p1->m < 0) p2->m = -p2->m;
 return p1->m;


int g()

 union 
 struct t1 s1;
 struct t2 s2;
 u;
 u.s1.m = -1;
 return f(&u.s1, &u.s2);

What must g return? +1 according to common sense (we change -1 to +1 in f). But if we look at gcc's generate assembly with -O1 optimization

f:
 cmp DWORD PTR [rdi], 0
 js .L3
.L2:
 mov eax, DWORD PTR [rdi]
 ret
.L3:
 neg DWORD PTR [rsi]
 jmp .L2
g:
 mov eax, 1
 ret

So far all is as excepted. But when we try it with -O2

f:
 mov eax, DWORD PTR [rdi]
 test eax, eax
 js .L4
 ret
.L4:
 neg DWORD PTR [rsi]
 ret
g:
 mov eax, -1
 ret

The return value is now a hardcoded -1

union here used only for
all non-static data members of a union object have the same address. as result &u.s1 == &u.s2.

is somebody not understand assembler code, can show in c/c++ how strict aliasing affect f code:

int f(struct t1* p1, struct t2* p2)

 int a = p1->m;
 if (a < 0) p2->m = -p2->m;
 return a;

so with different compilers and different optimization level the same source code can return different values (-1 or +1). so and undefined behavior as is

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

add a commentÂ |Â

up vote
2
down vote

This function

int f(struct t1* p1, struct t2* p2);

The union here is the only way to make p1 == p2 on binary level because all union member have the same address.

Another example:

struct t1 int m; ;
struct t2 int m; ;

int f(struct t1* p1, struct t2* p2)

 if (p1->m < 0) p2->m = -p2->m;
 return p1->m;


int g()

 union 
 struct t1 s1;
 struct t2 s2;
 u;
 u.s1.m = -1;
 return f(&u.s1, &u.s2);

What must g return? +1 according to common sense (we change -1 to +1 in f). But if we look at gcc's generate assembly with -O1 optimization

f:
 cmp DWORD PTR [rdi], 0
 js .L3
.L2:
 mov eax, DWORD PTR [rdi]
 ret
.L3:
 neg DWORD PTR [rsi]
 jmp .L2
g:
 mov eax, 1
 ret

So far all is as excepted. But when we try it with -O2

f:
 mov eax, DWORD PTR [rdi]
 test eax, eax
 js .L4
 ret
.L4:
 neg DWORD PTR [rsi]
 ret
g:
 mov eax, -1
 ret

The return value is now a hardcoded -1

union here used only for
all non-static data members of a union object have the same address. as result &u.s1 == &u.s2.

is somebody not understand assembler code, can show in c/c++ how strict aliasing affect f code:

int f(struct t1* p1, struct t2* p2)

 int a = p1->m;
 if (a < 0) p2->m = -p2->m;
 return a;

so with different compilers and different optimization level the same source code can return different values (-1 or +1). so and undefined behavior as is

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

add a commentÂ |Â

up vote
2
down vote

This function

int f(struct t1* p1, struct t2* p2);

The union here is the only way to make p1 == p2 on binary level because all union member have the same address.

Another example:

struct t1 int m; ;
struct t2 int m; ;

int f(struct t1* p1, struct t2* p2)

 if (p1->m < 0) p2->m = -p2->m;
 return p1->m;


int g()

 union 
 struct t1 s1;
 struct t2 s2;
 u;
 u.s1.m = -1;
 return f(&u.s1, &u.s2);

What must g return? +1 according to common sense (we change -1 to +1 in f). But if we look at gcc's generate assembly with -O1 optimization

f:
 cmp DWORD PTR [rdi], 0
 js .L3
.L2:
 mov eax, DWORD PTR [rdi]
 ret
.L3:
 neg DWORD PTR [rsi]
 jmp .L2
g:
 mov eax, 1
 ret

So far all is as excepted. But when we try it with -O2

f:
 mov eax, DWORD PTR [rdi]
 test eax, eax
 js .L4
 ret
.L4:
 neg DWORD PTR [rsi]
 ret
g:
 mov eax, -1
 ret

The return value is now a hardcoded -1

union here used only for
all non-static data members of a union object have the same address. as result &u.s1 == &u.s2.

is somebody not understand assembler code, can show in c/c++ how strict aliasing affect f code:

int f(struct t1* p1, struct t2* p2)

 int a = p1->m;
 if (a < 0) p2->m = -p2->m;
 return a;

so with different compilers and different optimization level the same source code can return different values (-1 or +1). so and undefined behavior as is

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

This function

int f(struct t1* p1, struct t2* p2);

The union here is the only way to make p1 == p2 on binary level because all union member have the same address.

Another example:

struct t1 int m; ;
struct t2 int m; ;

int f(struct t1* p1, struct t2* p2)

 if (p1->m < 0) p2->m = -p2->m;
 return p1->m;


int g()

 union 
 struct t1 s1;
 struct t2 s2;
 u;
 u.s1.m = -1;
 return f(&u.s1, &u.s2);

What must g return? +1 according to common sense (we change -1 to +1 in f). But if we look at gcc's generate assembly with -O1 optimization

f:
 cmp DWORD PTR [rdi], 0
 js .L3
.L2:
 mov eax, DWORD PTR [rdi]
 ret
.L3:
 neg DWORD PTR [rsi]
 jmp .L2
g:
 mov eax, 1
 ret

So far all is as excepted. But when we try it with -O2

f:
 mov eax, DWORD PTR [rdi]
 test eax, eax
 js .L4
 ret
.L4:
 neg DWORD PTR [rsi]
 ret
g:
 mov eax, -1
 ret

The return value is now a hardcoded -1

union here used only for
all non-static data members of a union object have the same address. as result &u.s1 == &u.s2.

is somebody not understand assembler code, can show in c/c++ how strict aliasing affect f code:

int f(struct t1* p1, struct t2* p2)

 int a = p1->m;
 if (a < 0) p2->m = -p2->m;
 return a;

so with different compilers and different optimization level the same source code can return different values (-1 or +1). so and undefined behavior as is

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

edited 27 mins ago

answered 1 hour ago

RbMm

15.7k1923

answered 1 hour ago

RbMm

15.7k1923

answered 1 hour ago

RbMm

15.7k1923

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

add a commentÂ |Â

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

An example of generated assembly with GCC 8.2.
â€“Â StoryTeller
33 mins ago

@StoryTeller - yes, the same (-1) with clang. but with msvc and icc always +1 returned (no strict aliacing here)
â€“Â RbMm
25 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky