Why â€œvectorizingâ€ this simple R loop gives wrong result?

up vote
7
down vote

favorite

Perhaps a very dumb question.

I am trying to "vectorize" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

I think it is simply x[sig] but the result does not match.

set.seed(0)
x <- round(runif(10), 2)
x[sig]
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63

What's wrong?

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

add a commentÂ |Â

up vote
7
down vote

favorite

Perhaps a very dumb question.

I am trying to "vectorize" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

I think it is simply x[sig] but the result does not match.

set.seed(0)
x <- round(runif(10), 2)
x[sig]
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63

What's wrong?

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

add a commentÂ |Â

up vote
7
down vote

favorite

Perhaps a very dumb question.

I am trying to "vectorize" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

I think it is simply x[sig] but the result does not match.

set.seed(0)
x <- round(runif(10), 2)
x[sig]
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63

What's wrong?

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

Perhaps a very dumb question.

I am trying to "vectorize" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

I think it is simply x[sig] but the result does not match.

set.seed(0)
x <- round(runif(10), 2)
x[sig]
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63

What's wrong?

r loops for-loop vectorization

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

edited 53 mins ago

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

asked 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
8
down vote

There is actually a trap here: address aliasing. The loop reads x and writes to x. The memory block it reads overlaps the memory block it writes to. Such self-reference introduces loop dependency and is a hazard for "vectorization".

By contrast, x[sig] creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

If we want to perform a shuffling / permutation of x, then x[sig] is the right one. The loop hopes to do "in-place" permutation without using extra memory, but "in-place" permutation is in fact a more complicated operation: not only entries of x need be swapped, entries of sig also need be swapped along the iteration.

If we deem the loop as the correct thing, then there is no way to "vectorize" it. Well, if implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with SIMD.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, mat[j, "rm"]]

It is tempting to "vectorize" it as

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

but it is actually wrong. Later OP changed the loop to

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]

which eliminates the address aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

1

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

1

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

add a commentÂ |Â

up vote
1
down vote

There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.

With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.

answered 5 mins ago

lebatsnok

3,56811118

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52597296%2fwhy-vectorizing-this-simple-r-loop-gives-wrong-result%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
8
down vote

By contrast, x[sig] creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, mat[j, "rm"]]

It is tempting to "vectorize" it as

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

but it is actually wrong. Later OP changed the loop to

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]

which eliminates the address aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

1

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

1

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

add a commentÂ |Â

up vote
8
down vote

By contrast, x[sig] creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, mat[j, "rm"]]

It is tempting to "vectorize" it as

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

but it is actually wrong. Later OP changed the loop to

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]

which eliminates the address aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

1

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

1

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

add a commentÂ |Â

up vote
8
down vote

By contrast, x[sig] creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, mat[j, "rm"]]

It is tempting to "vectorize" it as

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

but it is actually wrong. Later OP changed the loop to

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]

which eliminates the address aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

By contrast, x[sig] creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, mat[j, "rm"]]

It is tempting to "vectorize" it as

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

but it is actually wrong. Later OP changed the loop to

for (i in 1:num) 
 for (j in 1:num) 
 mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]

which eliminates the address aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

answered 1 hour ago

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

45.4k1489133

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

1

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

1

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

add a commentÂ |Â

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

1

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

1

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
â€“Â 42-
26 mins ago

@42- I think as long as the data to be modified are of the same mode as the original data, replacement or update of vector / matrix elements are indeed "in-place". You can try adding a tracemem(x) before the loop, and you will see no memory allocation message along the loop.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
22 mins ago

I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
â€“Â 42-
22 mins ago

@42- Thank you. If you every find anything, feel free to post it as an answer. I agree that my way of think of a loop is "C"-fashioned.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
16 mins ago

add a commentÂ |Â

up vote
1
down vote

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.

answered 5 mins ago

lebatsnok

3,56811118

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

add a commentÂ |Â

up vote
1
down vote

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.

answered 5 mins ago

lebatsnok

3,56811118

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

add a commentÂ |Â

up vote
1
down vote

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.

answered 5 mins ago

lebatsnok

3,56811118

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.

answered 5 mins ago

lebatsnok

3,56811118

answered 5 mins ago

lebatsnok

3,56811118

answered 5 mins ago

lebatsnok

3,56811118

answered 5 mins ago

lebatsnok

3,56811118

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

add a commentÂ |Â

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
â€“Â Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ
21 secs ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky