Why does “vectorizing” this simple R loop give a wrong result?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
11
down vote

favorite
3












Perhaps a very dumb question.



I am trying to "vectorize" the following loop:



set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1] 1 2 9 5 3 4 8 6 7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63


I think it is simply x[sig] but the result does not match.



set.seed(0)
x <- round(runif(10), 2)
x <- x[sig]
x
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63


What's wrong?




Remark



Obviously from the output we see that the for loop and x[sig] are different. The meaning of the latter is clear: permutation, hence many people tend to believe that the loop is just doing some wrong stuff. But never be so sure; it can be some well-defined dynamic process. The purpose of this Q & A is not to judge which is correct, but to explain why they are not equivalent. Hopefully it provides a solid case study for understanding "vectorization".










share|improve this question



























    up vote
    11
    down vote

    favorite
    3












    Perhaps a very dumb question.



    I am trying to "vectorize" the following loop:



    set.seed(0)
    x <- round(runif(10), 2)
    # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
    sig <- sample.int(10)
    # [1] 1 2 9 5 3 4 8 6 7 10
    for (i in seq_along(sig)) x[i] <- x[sig[i]]
    x
    # [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63


    I think it is simply x[sig] but the result does not match.



    set.seed(0)
    x <- round(runif(10), 2)
    x <- x[sig]
    x
    # [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63


    What's wrong?




    Remark



    Obviously from the output we see that the for loop and x[sig] are different. The meaning of the latter is clear: permutation, hence many people tend to believe that the loop is just doing some wrong stuff. But never be so sure; it can be some well-defined dynamic process. The purpose of this Q & A is not to judge which is correct, but to explain why they are not equivalent. Hopefully it provides a solid case study for understanding "vectorization".










    share|improve this question

























      up vote
      11
      down vote

      favorite
      3









      up vote
      11
      down vote

      favorite
      3






      3





      Perhaps a very dumb question.



      I am trying to "vectorize" the following loop:



      set.seed(0)
      x <- round(runif(10), 2)
      # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
      sig <- sample.int(10)
      # [1] 1 2 9 5 3 4 8 6 7 10
      for (i in seq_along(sig)) x[i] <- x[sig[i]]
      x
      # [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63


      I think it is simply x[sig] but the result does not match.



      set.seed(0)
      x <- round(runif(10), 2)
      x <- x[sig]
      x
      # [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63


      What's wrong?




      Remark



      Obviously from the output we see that the for loop and x[sig] are different. The meaning of the latter is clear: permutation, hence many people tend to believe that the loop is just doing some wrong stuff. But never be so sure; it can be some well-defined dynamic process. The purpose of this Q & A is not to judge which is correct, but to explain why they are not equivalent. Hopefully it provides a solid case study for understanding "vectorization".










      share|improve this question















      Perhaps a very dumb question.



      I am trying to "vectorize" the following loop:



      set.seed(0)
      x <- round(runif(10), 2)
      # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
      sig <- sample.int(10)
      # [1] 1 2 9 5 3 4 8 6 7 10
      for (i in seq_along(sig)) x[i] <- x[sig[i]]
      x
      # [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63


      I think it is simply x[sig] but the result does not match.



      set.seed(0)
      x <- round(runif(10), 2)
      x <- x[sig]
      x
      # [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63


      What's wrong?




      Remark



      Obviously from the output we see that the for loop and x[sig] are different. The meaning of the latter is clear: permutation, hence many people tend to believe that the loop is just doing some wrong stuff. But never be so sure; it can be some well-defined dynamic process. The purpose of this Q & A is not to judge which is correct, but to explain why they are not equivalent. Hopefully it provides a solid case study for understanding "vectorization".







      r loops for-loop vectorization






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 10 mins ago

























      asked 8 hours ago









      李哲源

      45.4k1490136




      45.4k1490136






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          12
          down vote













          warm-up



          As a warm-up, consider two simpler examples.



          ## example 1
          x <- 1:11
          for (i in 1:10) x[i] <- x[i + 1]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          x <- 1:11
          x[1:10] <- x[2:11]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          ## example 2
          x <- 1:11
          for (i in 1:10) x[i + 1] <- x[i]
          x
          # [1] 1 1 1 1 1 1 1 1 1 1 1

          x <- 1:11
          x[2:11] <- x[1:10]
          x
          # [1] 1 1 2 3 4 5 6 7 8 9 10


          "Vectorization" is successful in the 1st example but not the 2nd. Why?



          Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.



          Unrolling the loop in example 1 gives



          x[1] <- x[2]
          x[2] <- x[3]
          x[3] <- x[4]
          x[4] <- x[5]
          x[5] <- x[6]
          x[6] <- x[7]
          x[7] <- x[8]
          x[8] <- x[9]
          x[9] <- x[10]
          x[10] <- x[11]


          Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".



          The loop in example 2 is



          x[2] <- x[1]
          x[3] <- x[2]
          x[4] <- x[3]
          x[5] <- x[4]
          x[6] <- x[5]
          x[7] <- x[6]
          x[8] <- x[7]
          x[9] <- x[8]
          x[10] <- x[9]
          x[11] <- x[10]


          Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".



          In "vectorization" theory,



          • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;

          • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

          A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.




          in depth



          Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?



          Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".



          In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.



          reg <- x[2:3] ## load vector register
          x[1:2] <- reg ## store vector register
          -------------
          reg <- x[4:5] ## load vector register
          x[3:4] <- reg ## store vector register
          -------------
          reg <- x[6:7] ## load vector register
          x[5:6] <- reg ## store vector register
          -------------
          reg <- x[8:9] ## load vector register
          x[7:8] <- reg ## store vector register
          -------------
          reg <- x[10:11] ## load vector register
          x[9:10] <- reg ## store vector register


          Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:



          single (assembly) instruction -> single R instruction
          CPU vector registers -> temporary vectors
          parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors


          So, the R "vectorization" of the loop in example 1 is something like



          ## the C-level loop is implemented by function "["
          tmp <- x[2:11] ## load data into a temporary vector
          x[1:10] <- tmp ## fill temporary vector into x


          Most of the time we just do



          x[1:10] <- x[2:10]


          without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.




          a complete picture



          In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like



          a[1] <- b[1] + c[1]
          a[2] <- b[2] + c[2]
          a[3] <- b[3] + c[3]
          a[4] <- b[4] + c[4]


          where a, b and c are not aliased in memory, that is, the moemory blocks storing vectors a, b and c do not overlap. This is the idealist case, as no memory aliasing implies no data dependency.



          Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.




          back to the example in the question



          Now it is time to investigate the loop in the question.



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x[i] <- x[sig[i]]


          Unrolling the loop gives



          x[1] <- x[1]
          x[2] <- x[2]
          x[3] <- x[9] ## 3rd instruction
          x[4] <- x[5]
          x[5] <- x[3] ## 5th instruction
          x[6] <- x[4]
          x[7] <- x[8]
          x[8] <- x[6]
          x[9] <- x[7]
          x[10] <- x[10]


          There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).



          Well then, what does x <- x[sig] do? Let's first explicitly write out the temporary vector:



          tmp <- x[sig]
          x <- tmp


          Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:



          tmp[1] <- x[1]
          tmp[2] <- x[2]
          tmp[3] <- x[9]
          tmp[4] <- x[5]
          tmp[5] <- x[3]
          tmp[6] <- x[4]
          tmp[7] <- x[8]
          tmp[8] <- x[6]
          tmp[9] <- x[7]
          tmp[10] <- x[10]

          x[1] <- tmp[1]
          x[2] <- tmp[2]
          x[3] <- tmp[3]
          x[4] <- tmp[4]
          x[5] <- tmp[5]
          x[6] <- tmp[6]
          x[7] <- tmp[7]
          x[8] <- tmp[8]
          x[9] <- tmp[9]
          x[10] <- tmp[10]


          So x <- x[sig] is equivalent to



          for (i in 1:10) tmp[i] <- x[sig[i]]
          for (i in 1:10) x[i] <- tmp[i]
          rm(tmp); gc()


          which is not at all the original loop given in the question.




          Remark 1



          If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".




          Remark 2



          This Q & A is motivated by this Q & A. OP originally presented a loop



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, mat[j, "rm"]]




          It is tempting to "vectorize" it as



          mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]


          but it is potentially wrong. Later OP changed the loop to



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]




          which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.




          Remark 3



          I received some comments regarding whether the loop in the question is making in-place modification of x. Yes, it is. We can use tracemem:



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          tracemem(x)
          #[1] "<0x28f7340>"
          for (i in seq_along(sig)) x[i] <- x[sig[i]]
          tracemem(x)
          #[1] "<0x28f7340>"


          My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So The loop is indeed doing "in-place" modification without using extra memory.



          However, the loop is not doing "in-place" permutation. "in-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (in the end, sig would be 1:10).






          share|improve this answer


















          • 1




            I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
            – 42-
            8 hours ago







          • 2




            I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
            – 42-
            8 hours ago











          • I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
            – 42-
            5 hours ago











          • @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
            – juod
            5 hours ago

















          up vote
          3
          down vote













          There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.



          With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:



          set.seed(0)
          x <- x2 <- round(runif(10), 2)
          # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x2[i] <- x[sig[i]]
          identical(x2, x[sig])
          #TRUE


          No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.






          share|improve this answer




















          • Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
            – æŽå“²æº
            7 hours ago










          • The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
            – 42-
            5 hours ago











          • Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
            – æŽå“²æº
            4 hours ago


















          up vote
          1
          down vote













          This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:



          sample(10)
          [1] 3 7 1 5 6 9 10 8 4 2

          value at 1 goes to 3, and now there are two of those values
          value at 2 goes to 7, and now there are two of those values
          value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


          ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.



          replicate( 100, x <- round(runif(10), 2); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) )
          #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
          #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
          #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


          I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:



          table( replicate( 1000000, x <- round(runif(10), 5); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) ) )

          0 1 2 3 4 5 6 7 8
          1 269 13113 126104 360416 360827 125707 13269 294





          share|improve this answer






















          • Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
            – æŽå“²æº
            4 hours ago










          Your Answer





          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52597296%2fwhy-does-vectorizing-this-simple-r-loop-give-a-wrong-result%23new-answer', 'question_page');

          );

          Post as a guest






























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          12
          down vote













          warm-up



          As a warm-up, consider two simpler examples.



          ## example 1
          x <- 1:11
          for (i in 1:10) x[i] <- x[i + 1]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          x <- 1:11
          x[1:10] <- x[2:11]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          ## example 2
          x <- 1:11
          for (i in 1:10) x[i + 1] <- x[i]
          x
          # [1] 1 1 1 1 1 1 1 1 1 1 1

          x <- 1:11
          x[2:11] <- x[1:10]
          x
          # [1] 1 1 2 3 4 5 6 7 8 9 10


          "Vectorization" is successful in the 1st example but not the 2nd. Why?



          Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.



          Unrolling the loop in example 1 gives



          x[1] <- x[2]
          x[2] <- x[3]
          x[3] <- x[4]
          x[4] <- x[5]
          x[5] <- x[6]
          x[6] <- x[7]
          x[7] <- x[8]
          x[8] <- x[9]
          x[9] <- x[10]
          x[10] <- x[11]


          Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".



          The loop in example 2 is



          x[2] <- x[1]
          x[3] <- x[2]
          x[4] <- x[3]
          x[5] <- x[4]
          x[6] <- x[5]
          x[7] <- x[6]
          x[8] <- x[7]
          x[9] <- x[8]
          x[10] <- x[9]
          x[11] <- x[10]


          Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".



          In "vectorization" theory,



          • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;

          • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

          A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.




          in depth



          Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?



          Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".



          In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.



          reg <- x[2:3] ## load vector register
          x[1:2] <- reg ## store vector register
          -------------
          reg <- x[4:5] ## load vector register
          x[3:4] <- reg ## store vector register
          -------------
          reg <- x[6:7] ## load vector register
          x[5:6] <- reg ## store vector register
          -------------
          reg <- x[8:9] ## load vector register
          x[7:8] <- reg ## store vector register
          -------------
          reg <- x[10:11] ## load vector register
          x[9:10] <- reg ## store vector register


          Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:



          single (assembly) instruction -> single R instruction
          CPU vector registers -> temporary vectors
          parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors


          So, the R "vectorization" of the loop in example 1 is something like



          ## the C-level loop is implemented by function "["
          tmp <- x[2:11] ## load data into a temporary vector
          x[1:10] <- tmp ## fill temporary vector into x


          Most of the time we just do



          x[1:10] <- x[2:10]


          without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.




          a complete picture



          In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like



          a[1] <- b[1] + c[1]
          a[2] <- b[2] + c[2]
          a[3] <- b[3] + c[3]
          a[4] <- b[4] + c[4]


          where a, b and c are not aliased in memory, that is, the moemory blocks storing vectors a, b and c do not overlap. This is the idealist case, as no memory aliasing implies no data dependency.



          Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.




          back to the example in the question



          Now it is time to investigate the loop in the question.



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x[i] <- x[sig[i]]


          Unrolling the loop gives



          x[1] <- x[1]
          x[2] <- x[2]
          x[3] <- x[9] ## 3rd instruction
          x[4] <- x[5]
          x[5] <- x[3] ## 5th instruction
          x[6] <- x[4]
          x[7] <- x[8]
          x[8] <- x[6]
          x[9] <- x[7]
          x[10] <- x[10]


          There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).



          Well then, what does x <- x[sig] do? Let's first explicitly write out the temporary vector:



          tmp <- x[sig]
          x <- tmp


          Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:



          tmp[1] <- x[1]
          tmp[2] <- x[2]
          tmp[3] <- x[9]
          tmp[4] <- x[5]
          tmp[5] <- x[3]
          tmp[6] <- x[4]
          tmp[7] <- x[8]
          tmp[8] <- x[6]
          tmp[9] <- x[7]
          tmp[10] <- x[10]

          x[1] <- tmp[1]
          x[2] <- tmp[2]
          x[3] <- tmp[3]
          x[4] <- tmp[4]
          x[5] <- tmp[5]
          x[6] <- tmp[6]
          x[7] <- tmp[7]
          x[8] <- tmp[8]
          x[9] <- tmp[9]
          x[10] <- tmp[10]


          So x <- x[sig] is equivalent to



          for (i in 1:10) tmp[i] <- x[sig[i]]
          for (i in 1:10) x[i] <- tmp[i]
          rm(tmp); gc()


          which is not at all the original loop given in the question.




          Remark 1



          If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".




          Remark 2



          This Q & A is motivated by this Q & A. OP originally presented a loop



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, mat[j, "rm"]]




          It is tempting to "vectorize" it as



          mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]


          but it is potentially wrong. Later OP changed the loop to



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]




          which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.




          Remark 3



          I received some comments regarding whether the loop in the question is making in-place modification of x. Yes, it is. We can use tracemem:



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          tracemem(x)
          #[1] "<0x28f7340>"
          for (i in seq_along(sig)) x[i] <- x[sig[i]]
          tracemem(x)
          #[1] "<0x28f7340>"


          My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So The loop is indeed doing "in-place" modification without using extra memory.



          However, the loop is not doing "in-place" permutation. "in-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (in the end, sig would be 1:10).






          share|improve this answer


















          • 1




            I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
            – 42-
            8 hours ago







          • 2




            I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
            – 42-
            8 hours ago











          • I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
            – 42-
            5 hours ago











          • @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
            – juod
            5 hours ago














          up vote
          12
          down vote













          warm-up



          As a warm-up, consider two simpler examples.



          ## example 1
          x <- 1:11
          for (i in 1:10) x[i] <- x[i + 1]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          x <- 1:11
          x[1:10] <- x[2:11]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          ## example 2
          x <- 1:11
          for (i in 1:10) x[i + 1] <- x[i]
          x
          # [1] 1 1 1 1 1 1 1 1 1 1 1

          x <- 1:11
          x[2:11] <- x[1:10]
          x
          # [1] 1 1 2 3 4 5 6 7 8 9 10


          "Vectorization" is successful in the 1st example but not the 2nd. Why?



          Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.



          Unrolling the loop in example 1 gives



          x[1] <- x[2]
          x[2] <- x[3]
          x[3] <- x[4]
          x[4] <- x[5]
          x[5] <- x[6]
          x[6] <- x[7]
          x[7] <- x[8]
          x[8] <- x[9]
          x[9] <- x[10]
          x[10] <- x[11]


          Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".



          The loop in example 2 is



          x[2] <- x[1]
          x[3] <- x[2]
          x[4] <- x[3]
          x[5] <- x[4]
          x[6] <- x[5]
          x[7] <- x[6]
          x[8] <- x[7]
          x[9] <- x[8]
          x[10] <- x[9]
          x[11] <- x[10]


          Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".



          In "vectorization" theory,



          • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;

          • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

          A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.




          in depth



          Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?



          Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".



          In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.



          reg <- x[2:3] ## load vector register
          x[1:2] <- reg ## store vector register
          -------------
          reg <- x[4:5] ## load vector register
          x[3:4] <- reg ## store vector register
          -------------
          reg <- x[6:7] ## load vector register
          x[5:6] <- reg ## store vector register
          -------------
          reg <- x[8:9] ## load vector register
          x[7:8] <- reg ## store vector register
          -------------
          reg <- x[10:11] ## load vector register
          x[9:10] <- reg ## store vector register


          Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:



          single (assembly) instruction -> single R instruction
          CPU vector registers -> temporary vectors
          parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors


          So, the R "vectorization" of the loop in example 1 is something like



          ## the C-level loop is implemented by function "["
          tmp <- x[2:11] ## load data into a temporary vector
          x[1:10] <- tmp ## fill temporary vector into x


          Most of the time we just do



          x[1:10] <- x[2:10]


          without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.




          a complete picture



          In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like



          a[1] <- b[1] + c[1]
          a[2] <- b[2] + c[2]
          a[3] <- b[3] + c[3]
          a[4] <- b[4] + c[4]


          where a, b and c are not aliased in memory, that is, the moemory blocks storing vectors a, b and c do not overlap. This is the idealist case, as no memory aliasing implies no data dependency.



          Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.




          back to the example in the question



          Now it is time to investigate the loop in the question.



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x[i] <- x[sig[i]]


          Unrolling the loop gives



          x[1] <- x[1]
          x[2] <- x[2]
          x[3] <- x[9] ## 3rd instruction
          x[4] <- x[5]
          x[5] <- x[3] ## 5th instruction
          x[6] <- x[4]
          x[7] <- x[8]
          x[8] <- x[6]
          x[9] <- x[7]
          x[10] <- x[10]


          There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).



          Well then, what does x <- x[sig] do? Let's first explicitly write out the temporary vector:



          tmp <- x[sig]
          x <- tmp


          Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:



          tmp[1] <- x[1]
          tmp[2] <- x[2]
          tmp[3] <- x[9]
          tmp[4] <- x[5]
          tmp[5] <- x[3]
          tmp[6] <- x[4]
          tmp[7] <- x[8]
          tmp[8] <- x[6]
          tmp[9] <- x[7]
          tmp[10] <- x[10]

          x[1] <- tmp[1]
          x[2] <- tmp[2]
          x[3] <- tmp[3]
          x[4] <- tmp[4]
          x[5] <- tmp[5]
          x[6] <- tmp[6]
          x[7] <- tmp[7]
          x[8] <- tmp[8]
          x[9] <- tmp[9]
          x[10] <- tmp[10]


          So x <- x[sig] is equivalent to



          for (i in 1:10) tmp[i] <- x[sig[i]]
          for (i in 1:10) x[i] <- tmp[i]
          rm(tmp); gc()


          which is not at all the original loop given in the question.




          Remark 1



          If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".




          Remark 2



          This Q & A is motivated by this Q & A. OP originally presented a loop



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, mat[j, "rm"]]




          It is tempting to "vectorize" it as



          mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]


          but it is potentially wrong. Later OP changed the loop to



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]




          which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.




          Remark 3



          I received some comments regarding whether the loop in the question is making in-place modification of x. Yes, it is. We can use tracemem:



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          tracemem(x)
          #[1] "<0x28f7340>"
          for (i in seq_along(sig)) x[i] <- x[sig[i]]
          tracemem(x)
          #[1] "<0x28f7340>"


          My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So The loop is indeed doing "in-place" modification without using extra memory.



          However, the loop is not doing "in-place" permutation. "in-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (in the end, sig would be 1:10).






          share|improve this answer


















          • 1




            I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
            – 42-
            8 hours ago







          • 2




            I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
            – 42-
            8 hours ago











          • I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
            – 42-
            5 hours ago











          • @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
            – juod
            5 hours ago












          up vote
          12
          down vote










          up vote
          12
          down vote









          warm-up



          As a warm-up, consider two simpler examples.



          ## example 1
          x <- 1:11
          for (i in 1:10) x[i] <- x[i + 1]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          x <- 1:11
          x[1:10] <- x[2:11]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          ## example 2
          x <- 1:11
          for (i in 1:10) x[i + 1] <- x[i]
          x
          # [1] 1 1 1 1 1 1 1 1 1 1 1

          x <- 1:11
          x[2:11] <- x[1:10]
          x
          # [1] 1 1 2 3 4 5 6 7 8 9 10


          "Vectorization" is successful in the 1st example but not the 2nd. Why?



          Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.



          Unrolling the loop in example 1 gives



          x[1] <- x[2]
          x[2] <- x[3]
          x[3] <- x[4]
          x[4] <- x[5]
          x[5] <- x[6]
          x[6] <- x[7]
          x[7] <- x[8]
          x[8] <- x[9]
          x[9] <- x[10]
          x[10] <- x[11]


          Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".



          The loop in example 2 is



          x[2] <- x[1]
          x[3] <- x[2]
          x[4] <- x[3]
          x[5] <- x[4]
          x[6] <- x[5]
          x[7] <- x[6]
          x[8] <- x[7]
          x[9] <- x[8]
          x[10] <- x[9]
          x[11] <- x[10]


          Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".



          In "vectorization" theory,



          • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;

          • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

          A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.




          in depth



          Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?



          Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".



          In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.



          reg <- x[2:3] ## load vector register
          x[1:2] <- reg ## store vector register
          -------------
          reg <- x[4:5] ## load vector register
          x[3:4] <- reg ## store vector register
          -------------
          reg <- x[6:7] ## load vector register
          x[5:6] <- reg ## store vector register
          -------------
          reg <- x[8:9] ## load vector register
          x[7:8] <- reg ## store vector register
          -------------
          reg <- x[10:11] ## load vector register
          x[9:10] <- reg ## store vector register


          Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:



          single (assembly) instruction -> single R instruction
          CPU vector registers -> temporary vectors
          parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors


          So, the R "vectorization" of the loop in example 1 is something like



          ## the C-level loop is implemented by function "["
          tmp <- x[2:11] ## load data into a temporary vector
          x[1:10] <- tmp ## fill temporary vector into x


          Most of the time we just do



          x[1:10] <- x[2:10]


          without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.




          a complete picture



          In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like



          a[1] <- b[1] + c[1]
          a[2] <- b[2] + c[2]
          a[3] <- b[3] + c[3]
          a[4] <- b[4] + c[4]


          where a, b and c are not aliased in memory, that is, the moemory blocks storing vectors a, b and c do not overlap. This is the idealist case, as no memory aliasing implies no data dependency.



          Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.




          back to the example in the question



          Now it is time to investigate the loop in the question.



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x[i] <- x[sig[i]]


          Unrolling the loop gives



          x[1] <- x[1]
          x[2] <- x[2]
          x[3] <- x[9] ## 3rd instruction
          x[4] <- x[5]
          x[5] <- x[3] ## 5th instruction
          x[6] <- x[4]
          x[7] <- x[8]
          x[8] <- x[6]
          x[9] <- x[7]
          x[10] <- x[10]


          There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).



          Well then, what does x <- x[sig] do? Let's first explicitly write out the temporary vector:



          tmp <- x[sig]
          x <- tmp


          Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:



          tmp[1] <- x[1]
          tmp[2] <- x[2]
          tmp[3] <- x[9]
          tmp[4] <- x[5]
          tmp[5] <- x[3]
          tmp[6] <- x[4]
          tmp[7] <- x[8]
          tmp[8] <- x[6]
          tmp[9] <- x[7]
          tmp[10] <- x[10]

          x[1] <- tmp[1]
          x[2] <- tmp[2]
          x[3] <- tmp[3]
          x[4] <- tmp[4]
          x[5] <- tmp[5]
          x[6] <- tmp[6]
          x[7] <- tmp[7]
          x[8] <- tmp[8]
          x[9] <- tmp[9]
          x[10] <- tmp[10]


          So x <- x[sig] is equivalent to



          for (i in 1:10) tmp[i] <- x[sig[i]]
          for (i in 1:10) x[i] <- tmp[i]
          rm(tmp); gc()


          which is not at all the original loop given in the question.




          Remark 1



          If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".




          Remark 2



          This Q & A is motivated by this Q & A. OP originally presented a loop



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, mat[j, "rm"]]




          It is tempting to "vectorize" it as



          mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]


          but it is potentially wrong. Later OP changed the loop to



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]




          which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.




          Remark 3



          I received some comments regarding whether the loop in the question is making in-place modification of x. Yes, it is. We can use tracemem:



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          tracemem(x)
          #[1] "<0x28f7340>"
          for (i in seq_along(sig)) x[i] <- x[sig[i]]
          tracemem(x)
          #[1] "<0x28f7340>"


          My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So The loop is indeed doing "in-place" modification without using extra memory.



          However, the loop is not doing "in-place" permutation. "in-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (in the end, sig would be 1:10).






          share|improve this answer














          warm-up



          As a warm-up, consider two simpler examples.



          ## example 1
          x <- 1:11
          for (i in 1:10) x[i] <- x[i + 1]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          x <- 1:11
          x[1:10] <- x[2:11]
          x
          # [1] 2 3 4 5 6 7 8 9 10 11 11

          ## example 2
          x <- 1:11
          for (i in 1:10) x[i + 1] <- x[i]
          x
          # [1] 1 1 1 1 1 1 1 1 1 1 1

          x <- 1:11
          x[2:11] <- x[1:10]
          x
          # [1] 1 1 2 3 4 5 6 7 8 9 10


          "Vectorization" is successful in the 1st example but not the 2nd. Why?



          Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.



          Unrolling the loop in example 1 gives



          x[1] <- x[2]
          x[2] <- x[3]
          x[3] <- x[4]
          x[4] <- x[5]
          x[5] <- x[6]
          x[6] <- x[7]
          x[7] <- x[8]
          x[8] <- x[9]
          x[9] <- x[10]
          x[10] <- x[11]


          Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".



          The loop in example 2 is



          x[2] <- x[1]
          x[3] <- x[2]
          x[4] <- x[3]
          x[5] <- x[4]
          x[6] <- x[5]
          x[7] <- x[6]
          x[8] <- x[7]
          x[9] <- x[8]
          x[10] <- x[9]
          x[11] <- x[10]


          Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".



          In "vectorization" theory,



          • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;

          • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

          A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.




          in depth



          Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?



          Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".



          In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.



          reg <- x[2:3] ## load vector register
          x[1:2] <- reg ## store vector register
          -------------
          reg <- x[4:5] ## load vector register
          x[3:4] <- reg ## store vector register
          -------------
          reg <- x[6:7] ## load vector register
          x[5:6] <- reg ## store vector register
          -------------
          reg <- x[8:9] ## load vector register
          x[7:8] <- reg ## store vector register
          -------------
          reg <- x[10:11] ## load vector register
          x[9:10] <- reg ## store vector register


          Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:



          single (assembly) instruction -> single R instruction
          CPU vector registers -> temporary vectors
          parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors


          So, the R "vectorization" of the loop in example 1 is something like



          ## the C-level loop is implemented by function "["
          tmp <- x[2:11] ## load data into a temporary vector
          x[1:10] <- tmp ## fill temporary vector into x


          Most of the time we just do



          x[1:10] <- x[2:10]


          without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.




          a complete picture



          In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like



          a[1] <- b[1] + c[1]
          a[2] <- b[2] + c[2]
          a[3] <- b[3] + c[3]
          a[4] <- b[4] + c[4]


          where a, b and c are not aliased in memory, that is, the moemory blocks storing vectors a, b and c do not overlap. This is the idealist case, as no memory aliasing implies no data dependency.



          Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.




          back to the example in the question



          Now it is time to investigate the loop in the question.



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x[i] <- x[sig[i]]


          Unrolling the loop gives



          x[1] <- x[1]
          x[2] <- x[2]
          x[3] <- x[9] ## 3rd instruction
          x[4] <- x[5]
          x[5] <- x[3] ## 5th instruction
          x[6] <- x[4]
          x[7] <- x[8]
          x[8] <- x[6]
          x[9] <- x[7]
          x[10] <- x[10]


          There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).



          Well then, what does x <- x[sig] do? Let's first explicitly write out the temporary vector:



          tmp <- x[sig]
          x <- tmp


          Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:



          tmp[1] <- x[1]
          tmp[2] <- x[2]
          tmp[3] <- x[9]
          tmp[4] <- x[5]
          tmp[5] <- x[3]
          tmp[6] <- x[4]
          tmp[7] <- x[8]
          tmp[8] <- x[6]
          tmp[9] <- x[7]
          tmp[10] <- x[10]

          x[1] <- tmp[1]
          x[2] <- tmp[2]
          x[3] <- tmp[3]
          x[4] <- tmp[4]
          x[5] <- tmp[5]
          x[6] <- tmp[6]
          x[7] <- tmp[7]
          x[8] <- tmp[8]
          x[9] <- tmp[9]
          x[10] <- tmp[10]


          So x <- x[sig] is equivalent to



          for (i in 1:10) tmp[i] <- x[sig[i]]
          for (i in 1:10) x[i] <- tmp[i]
          rm(tmp); gc()


          which is not at all the original loop given in the question.




          Remark 1



          If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".




          Remark 2



          This Q & A is motivated by this Q & A. OP originally presented a loop



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, mat[j, "rm"]]




          It is tempting to "vectorize" it as



          mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]


          but it is potentially wrong. Later OP changed the loop to



          for (i in 1:num) 
          for (j in 1:num)
          mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]




          which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.




          Remark 3



          I received some comments regarding whether the loop in the question is making in-place modification of x. Yes, it is. We can use tracemem:



          set.seed(0)
          x <- round(runif(10), 2)
          sig <- sample.int(10)
          tracemem(x)
          #[1] "<0x28f7340>"
          for (i in seq_along(sig)) x[i] <- x[sig[i]]
          tracemem(x)
          #[1] "<0x28f7340>"


          My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So The loop is indeed doing "in-place" modification without using extra memory.



          However, the loop is not doing "in-place" permutation. "in-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (in the end, sig would be 1:10).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 10 mins ago

























          answered 8 hours ago









          李哲源

          45.4k1490136




          45.4k1490136







          • 1




            I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
            – 42-
            8 hours ago







          • 2




            I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
            – 42-
            8 hours ago











          • I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
            – 42-
            5 hours ago











          • @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
            – juod
            5 hours ago












          • 1




            I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
            – 42-
            8 hours ago







          • 2




            I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
            – 42-
            8 hours ago











          • I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
            – 42-
            5 hours ago











          • @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
            – juod
            5 hours ago







          1




          1




          I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
          – 42-
          8 hours ago





          I agree sommewhat with the first paragraph. it is the assignment to x from x sequentially versus "en bloc" that causes the discrepancy, but there is never over-writing of a "memory block". R does not make assignments "in place". Rather it makes a temporary copy of the original and renames it. And I would also not say it is a danger of "vectorization" since you were not really using what is called vectorization when using a for-loop. I would have considered the vectorized result correct and the for-loop method as incorrect.
          – 42-
          8 hours ago





          2




          2




          I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
          – 42-
          8 hours ago





          I will be very surprised if this turns out to be the case. I'll try to track down more authoritative documentation.
          – 42-
          8 hours ago













          I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
          – 42-
          5 hours ago





          I think you are not understanding how tracemem works. Try a <- 1:10; tracemem(a); a <- a[10:1]; tracemem(a). The location of a changes.
          – 42-
          5 hours ago













          @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
          – juod
          5 hours ago




          @李哲源 @42- Also keep in mind that R3.4.0 introduced just-in-time compilation for top-level loops. I'd guess the end result might depend on how sophisticated that compiler is. (compiler::enableJIT(0) can turn it off for testing)
          – juod
          5 hours ago












          up vote
          3
          down vote













          There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.



          With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:



          set.seed(0)
          x <- x2 <- round(runif(10), 2)
          # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x2[i] <- x[sig[i]]
          identical(x2, x[sig])
          #TRUE


          No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.






          share|improve this answer




















          • Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
            – æŽå“²æº
            7 hours ago










          • The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
            – 42-
            5 hours ago











          • Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
            – æŽå“²æº
            4 hours ago















          up vote
          3
          down vote













          There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.



          With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:



          set.seed(0)
          x <- x2 <- round(runif(10), 2)
          # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x2[i] <- x[sig[i]]
          identical(x2, x[sig])
          #TRUE


          No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.






          share|improve this answer




















          • Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
            – æŽå“²æº
            7 hours ago










          • The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
            – 42-
            5 hours ago











          • Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
            – æŽå“²æº
            4 hours ago













          up vote
          3
          down vote










          up vote
          3
          down vote









          There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.



          With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:



          set.seed(0)
          x <- x2 <- round(runif(10), 2)
          # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x2[i] <- x[sig[i]]
          identical(x2, x[sig])
          #TRUE


          No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.






          share|improve this answer












          There is a simpler explanation. With your loop, you are overwriting one element of x at every step, replacing its former value by one of the other elements of x. So you get what you asked for. Essentially, it is a complicated form of sampling with replacement (sample(x, replace=TRUE)) -- whether you need such a complication, depends on what you want to achieve.



          With your vectorized code, you are just asking for a certain permutation of x (without replacement), and that is what you get. The vectorized code is not doing the same thing as your loop. If you want to achieve the same result with a loop, you would first need to make a copy of x:



          set.seed(0)
          x <- x2 <- round(runif(10), 2)
          # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
          sig <- sample.int(10)
          # [1] 1 2 9 5 3 4 8 6 7 10
          for (i in seq_along(sig)) x2[i] <- x[sig[i]]
          identical(x2, x[sig])
          #TRUE


          No danger of aliasing here: x and x2 refer initially to the same memory location but his will change as soon as you change the first element of x2.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 7 hours ago









          lebatsnok

          3,58811118




          3,58811118











          • Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
            – æŽå“²æº
            7 hours ago










          • The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
            – 42-
            5 hours ago











          • Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
            – æŽå“²æº
            4 hours ago

















          • Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
            – æŽå“²æº
            7 hours ago










          • The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
            – 42-
            5 hours ago











          • Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
            – æŽå“²æº
            4 hours ago
















          Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
          – æŽå“²æº
          7 hours ago




          Yes, I know that the loop and x[sig] are different. Maybe I did not make this clear in my elaboration... But your interpreting the former as sample with replacement and the latter as sample without replacement is interesting. While it may not be precise (as given the sig, the result of the loop and x[sig] are both deterministic), it is indeed a different view of the issue.
          – æŽå“²æº
          7 hours ago












          The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
          – 42-
          5 hours ago





          The results of the for loop depend on the number of "collisions", i.e the number of time a later index references a location that already had a "new" number place in it. Some values get moved more than once in the for-loop version.
          – 42-
          5 hours ago













          Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
          – æŽå“²æº
          4 hours ago





          Actually, if sig <- 1:10 ... you can't interpret the loop as sample with replacement.
          – æŽå“²æº
          4 hours ago











          up vote
          1
          down vote













          This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:



          sample(10)
          [1] 3 7 1 5 6 9 10 8 4 2

          value at 1 goes to 3, and now there are two of those values
          value at 2 goes to 7, and now there are two of those values
          value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


          ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.



          replicate( 100, x <- round(runif(10), 2); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) )
          #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
          #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
          #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


          I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:



          table( replicate( 1000000, x <- round(runif(10), 5); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) ) )

          0 1 2 3 4 5 6 7 8
          1 269 13113 126104 360416 360827 125707 13269 294





          share|improve this answer






















          • Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
            – æŽå“²æº
            4 hours ago














          up vote
          1
          down vote













          This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:



          sample(10)
          [1] 3 7 1 5 6 9 10 8 4 2

          value at 1 goes to 3, and now there are two of those values
          value at 2 goes to 7, and now there are two of those values
          value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


          ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.



          replicate( 100, x <- round(runif(10), 2); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) )
          #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
          #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
          #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


          I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:



          table( replicate( 1000000, x <- round(runif(10), 5); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) ) )

          0 1 2 3 4 5 6 7 8
          1 269 13113 126104 360416 360827 125707 13269 294





          share|improve this answer






















          • Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
            – æŽå“²æº
            4 hours ago












          up vote
          1
          down vote










          up vote
          1
          down vote









          This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:



          sample(10)
          [1] 3 7 1 5 6 9 10 8 4 2

          value at 1 goes to 3, and now there are two of those values
          value at 2 goes to 7, and now there are two of those values
          value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


          ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.



          replicate( 100, x <- round(runif(10), 2); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) )
          #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
          #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
          #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


          I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:



          table( replicate( 1000000, x <- round(runif(10), 5); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) ) )

          0 1 2 3 4 5 6 7 8
          1 269 13113 126104 360416 360827 125707 13269 294





          share|improve this answer














          This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:



          sample(10)
          [1] 3 7 1 5 6 9 10 8 4 2

          value at 1 goes to 3, and now there are two of those values
          value at 2 goes to 7, and now there are two of those values
          value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


          ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.



          replicate( 100, x <- round(runif(10), 2); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) )
          #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
          #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
          #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


          I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:



          table( replicate( 1000000, x <- round(runif(10), 5); 
          sig <- sample.int(10);
          for (i in seq_along(sig)) x[i] <- x[sig[i]];
          sum(duplicated(x)) ) )

          0 1 2 3 4 5 6 7 8
          1 269 13113 126104 360416 360827 125707 13269 294






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 4 hours ago

























          answered 4 hours ago









          42-

          206k14246388




          206k14246388











          • Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
            – æŽå“²æº
            4 hours ago
















          • Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
            – æŽå“²æº
            4 hours ago















          Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
          – æŽå“²æº
          4 hours ago




          Thanks for this. I realized that I was not being accurate. A simple example like x <- 1:11; x[1:10] <- x[2:11] would cause no "vectorization" hazard, even though memory blocks for read and write overlaps. The real issue is "read-after-write" or "write-after-read".
          – æŽå“²æº
          4 hours ago

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52597296%2fwhy-does-vectorizing-this-simple-r-loop-give-a-wrong-result%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          Long meetings (6-7 hours a day): Being “babysat” by supervisor

          Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

          Confectionery