Replace words in an unstructured text file using a for loop

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite












I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).



Below I want the manipulated text to include all terms: "one", "two", "three" and "four" once, instead of the "change" strings. However, as you can see sub changes the first pattern in each element, but I need the code to ignore that there are new strings with quotes.



See example code and data below.



 #text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
#For loop that I want to include
for (i in 1:length(new))

text <- sub(pattern = "change", replace = new[i], x = text)


text









share|improve this question























  • Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
    – Vladimir Volokhonsky
    1 hour ago














up vote
8
down vote

favorite












I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).



Below I want the manipulated text to include all terms: "one", "two", "three" and "four" once, instead of the "change" strings. However, as you can see sub changes the first pattern in each element, but I need the code to ignore that there are new strings with quotes.



See example code and data below.



 #text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
#For loop that I want to include
for (i in 1:length(new))

text <- sub(pattern = "change", replace = new[i], x = text)


text









share|improve this question























  • Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
    – Vladimir Volokhonsky
    1 hour ago












up vote
8
down vote

favorite









up vote
8
down vote

favorite











I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).



Below I want the manipulated text to include all terms: "one", "two", "three" and "four" once, instead of the "change" strings. However, as you can see sub changes the first pattern in each element, but I need the code to ignore that there are new strings with quotes.



See example code and data below.



 #text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
#For loop that I want to include
for (i in 1:length(new))

text <- sub(pattern = "change", replace = new[i], x = text)


text









share|improve this question















I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).



Below I want the manipulated text to include all terms: "one", "two", "three" and "four" once, instead of the "change" strings. However, as you can see sub changes the first pattern in each element, but I need the code to ignore that there are new strings with quotes.



See example code and data below.



 #text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
#For loop that I want to include
for (i in 1:length(new))

text <- sub(pattern = "change", replace = new[i], x = text)


text






r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 22 mins ago









Jaap

52.9k20115123




52.9k20115123










asked 1 hour ago









Gorp

18619




18619











  • Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
    – Vladimir Volokhonsky
    1 hour ago
















  • Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
    – Vladimir Volokhonsky
    1 hour ago















Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
– Vladimir Volokhonsky
1 hour ago




Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
– Vladimir Volokhonsky
1 hour ago












3 Answers
3






active

oldest

votes

















up vote
6
down vote













How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.



text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
"TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text))
while (grepl(pattern = "change", text[i]))
text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
new.i <- new.i + 1


text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"





share|improve this answer



























    up vote
    1
    down vote













    Here is another solution using gregexpr() and regmatches():





    #text to be changed
    text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

    #Variable containing input for text
    new <- c("one", "two", "three", "four")

    # Alter the structure of text
    altered_text <- paste(text, collapse = "n")

    # So we can use gregexpr and regmatches to get what you want
    matches <- gregexpr("change", altered_text)
    regmatches(altered_text, matches) <- list(new)

    # And here's the result
    cat(altered_text)
    #> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
    #> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
    #> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

    # Or, putting the text back to its old structure
    # (one element for each line)
    unlist(strsplit(altered_text, "n"))
    #> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
    #> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
    #> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"


    Created on 2018-10-16 by the reprex package (v0.2.1)



    We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):




    regexpr returns an integer vector of the same length as text giving
    the starting position of the first match....



    gregexpr returns a list of the same length as text each element of
    which is of the same form as the return value for regexpr, except that
    the starting positions of every (disjoint) match are given.




    (emphasis added).



    Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):




    Usage



    regmatches(x, m, invert = FALSE)

    regmatches(x, m, invert = FALSE) <- value



    ...



    value

    an object with suitable replacement values for the matched or
    non-matched substrings (see Details).



    ...



    Details



    The replacement function can be used for replacing the matched or
    non-matched substrings. For vector match data, if invert is FALSE,
    value should be a character vector with length the number of matched
    elements in m. Otherwise, it should be a list of character vectors
    with the same length as m, each as long as the number of replacements
    needed.







    share|improve this answer





























      up vote
      1
      down vote













      Another approach using strsplit:



      tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
      df <- stack(setNames(tl, seq_along(tl)))

      ix <- df$values == "change"
      df[ix, "values"] <- new
      tapply(df$values, df$ind, paste, collapse = " ")


      which gives:




       1 
      "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
      2
      "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
      3
      "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"



      Additionally you could wrap the tapply call in unname:



       unname(tapply(df$values, df$ind, paste, collapse = " "))


      which gives:




      [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
      [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
      [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"




      If you want to use the elements of new only once, you could update the code to:



      newnew <- new[1:3]

      ix <- df$values == "change"
      df[ix, "values"][1:length(newnew)] <- newnew
      unname(tapply(df$values, df$ind, paste, collapse = " "))



      You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:



      newnew2 <- c(new, "five")

      tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
      df <- stack(setNames(tl, seq_along(tl)))

      ix <- df$values == "change"
      df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
      unname(tapply(df$values, df$ind, paste, collapse = " "))





      share|improve this answer






















        Your Answer





        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52832017%2freplace-words-in-an-unstructured-text-file-using-a-for-loop%23new-answer', 'question_page');

        );

        Post as a guest






























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        6
        down vote













        How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.



        text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
        "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
        "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

        #Variable containing input for text
        new <- c("one", "two", "three", "four")
        new.i <- 1

        for (i in 1:length(text))
        while (grepl(pattern = "change", text[i]))
        text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
        new.i <- new.i + 1


        text

        [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
        [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
        [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"





        share|improve this answer
























          up vote
          6
          down vote













          How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.



          text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
          "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
          "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

          #Variable containing input for text
          new <- c("one", "two", "three", "four")
          new.i <- 1

          for (i in 1:length(text))
          while (grepl(pattern = "change", text[i]))
          text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
          new.i <- new.i + 1


          text

          [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
          [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
          [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"





          share|improve this answer






















            up vote
            6
            down vote










            up vote
            6
            down vote









            How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.



            text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
            "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
            "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

            #Variable containing input for text
            new <- c("one", "two", "three", "four")
            new.i <- 1

            for (i in 1:length(text))
            while (grepl(pattern = "change", text[i]))
            text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
            new.i <- new.i + 1


            text

            [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
            [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
            [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"





            share|improve this answer












            How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.



            text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
            "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
            "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

            #Variable containing input for text
            new <- c("one", "two", "three", "four")
            new.i <- 1

            for (i in 1:length(text))
            while (grepl(pattern = "change", text[i]))
            text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
            new.i <- new.i + 1


            text

            [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
            [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
            [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 1 hour ago









            Roman Luštrik

            48.2k17103158




            48.2k17103158






















                up vote
                1
                down vote













                Here is another solution using gregexpr() and regmatches():





                #text to be changed
                text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
                "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
                "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

                #Variable containing input for text
                new <- c("one", "two", "three", "four")

                # Alter the structure of text
                altered_text <- paste(text, collapse = "n")

                # So we can use gregexpr and regmatches to get what you want
                matches <- gregexpr("change", altered_text)
                regmatches(altered_text, matches) <- list(new)

                # And here's the result
                cat(altered_text)
                #> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
                #> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
                #> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

                # Or, putting the text back to its old structure
                # (one element for each line)
                unlist(strsplit(altered_text, "n"))
                #> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                #> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                #> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"


                Created on 2018-10-16 by the reprex package (v0.2.1)



                We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):




                regexpr returns an integer vector of the same length as text giving
                the starting position of the first match....



                gregexpr returns a list of the same length as text each element of
                which is of the same form as the return value for regexpr, except that
                the starting positions of every (disjoint) match are given.




                (emphasis added).



                Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):




                Usage



                regmatches(x, m, invert = FALSE)

                regmatches(x, m, invert = FALSE) <- value



                ...



                value

                an object with suitable replacement values for the matched or
                non-matched substrings (see Details).



                ...



                Details



                The replacement function can be used for replacing the matched or
                non-matched substrings. For vector match data, if invert is FALSE,
                value should be a character vector with length the number of matched
                elements in m. Otherwise, it should be a list of character vectors
                with the same length as m, each as long as the number of replacements
                needed.







                share|improve this answer


























                  up vote
                  1
                  down vote













                  Here is another solution using gregexpr() and regmatches():





                  #text to be changed
                  text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
                  "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
                  "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

                  #Variable containing input for text
                  new <- c("one", "two", "three", "four")

                  # Alter the structure of text
                  altered_text <- paste(text, collapse = "n")

                  # So we can use gregexpr and regmatches to get what you want
                  matches <- gregexpr("change", altered_text)
                  regmatches(altered_text, matches) <- list(new)

                  # And here's the result
                  cat(altered_text)
                  #> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
                  #> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
                  #> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

                  # Or, putting the text back to its old structure
                  # (one element for each line)
                  unlist(strsplit(altered_text, "n"))
                  #> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                  #> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                  #> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"


                  Created on 2018-10-16 by the reprex package (v0.2.1)



                  We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):




                  regexpr returns an integer vector of the same length as text giving
                  the starting position of the first match....



                  gregexpr returns a list of the same length as text each element of
                  which is of the same form as the return value for regexpr, except that
                  the starting positions of every (disjoint) match are given.




                  (emphasis added).



                  Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):




                  Usage



                  regmatches(x, m, invert = FALSE)

                  regmatches(x, m, invert = FALSE) <- value



                  ...



                  value

                  an object with suitable replacement values for the matched or
                  non-matched substrings (see Details).



                  ...



                  Details



                  The replacement function can be used for replacing the matched or
                  non-matched substrings. For vector match data, if invert is FALSE,
                  value should be a character vector with length the number of matched
                  elements in m. Otherwise, it should be a list of character vectors
                  with the same length as m, each as long as the number of replacements
                  needed.







                  share|improve this answer
























                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    Here is another solution using gregexpr() and regmatches():





                    #text to be changed
                    text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
                    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
                    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

                    #Variable containing input for text
                    new <- c("one", "two", "three", "four")

                    # Alter the structure of text
                    altered_text <- paste(text, collapse = "n")

                    # So we can use gregexpr and regmatches to get what you want
                    matches <- gregexpr("change", altered_text)
                    regmatches(altered_text, matches) <- list(new)

                    # And here's the result
                    cat(altered_text)
                    #> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
                    #> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
                    #> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

                    # Or, putting the text back to its old structure
                    # (one element for each line)
                    unlist(strsplit(altered_text, "n"))
                    #> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                    #> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                    #> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"


                    Created on 2018-10-16 by the reprex package (v0.2.1)



                    We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):




                    regexpr returns an integer vector of the same length as text giving
                    the starting position of the first match....



                    gregexpr returns a list of the same length as text each element of
                    which is of the same form as the return value for regexpr, except that
                    the starting positions of every (disjoint) match are given.




                    (emphasis added).



                    Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):




                    Usage



                    regmatches(x, m, invert = FALSE)

                    regmatches(x, m, invert = FALSE) <- value



                    ...



                    value

                    an object with suitable replacement values for the matched or
                    non-matched substrings (see Details).



                    ...



                    Details



                    The replacement function can be used for replacing the matched or
                    non-matched substrings. For vector match data, if invert is FALSE,
                    value should be a character vector with length the number of matched
                    elements in m. Otherwise, it should be a list of character vectors
                    with the same length as m, each as long as the number of replacements
                    needed.







                    share|improve this answer














                    Here is another solution using gregexpr() and regmatches():





                    #text to be changed
                    text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
                    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
                    "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

                    #Variable containing input for text
                    new <- c("one", "two", "three", "four")

                    # Alter the structure of text
                    altered_text <- paste(text, collapse = "n")

                    # So we can use gregexpr and regmatches to get what you want
                    matches <- gregexpr("change", altered_text)
                    regmatches(altered_text, matches) <- list(new)

                    # And here's the result
                    cat(altered_text)
                    #> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
                    #> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
                    #> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

                    # Or, putting the text back to its old structure
                    # (one element for each line)
                    unlist(strsplit(altered_text, "n"))
                    #> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                    #> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                    #> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"


                    Created on 2018-10-16 by the reprex package (v0.2.1)



                    We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):




                    regexpr returns an integer vector of the same length as text giving
                    the starting position of the first match....



                    gregexpr returns a list of the same length as text each element of
                    which is of the same form as the return value for regexpr, except that
                    the starting positions of every (disjoint) match are given.




                    (emphasis added).



                    Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):




                    Usage



                    regmatches(x, m, invert = FALSE)

                    regmatches(x, m, invert = FALSE) <- value



                    ...



                    value

                    an object with suitable replacement values for the matched or
                    non-matched substrings (see Details).



                    ...



                    Details



                    The replacement function can be used for replacing the matched or
                    non-matched substrings. For vector match data, if invert is FALSE,
                    value should be a character vector with length the number of matched
                    elements in m. Otherwise, it should be a list of character vectors
                    with the same length as m, each as long as the number of replacements
                    needed.








                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 50 mins ago

























                    answered 59 mins ago









                    duckmayr

                    5,33911124




                    5,33911124




















                        up vote
                        1
                        down vote













                        Another approach using strsplit:



                        tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                        df <- stack(setNames(tl, seq_along(tl)))

                        ix <- df$values == "change"
                        df[ix, "values"] <- new
                        tapply(df$values, df$ind, paste, collapse = " ")


                        which gives:




                         1 
                        "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                        2
                        "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                        3
                        "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"



                        Additionally you could wrap the tapply call in unname:



                         unname(tapply(df$values, df$ind, paste, collapse = " "))


                        which gives:




                        [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
                        [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                        [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"




                        If you want to use the elements of new only once, you could update the code to:



                        newnew <- new[1:3]

                        ix <- df$values == "change"
                        df[ix, "values"][1:length(newnew)] <- newnew
                        unname(tapply(df$values, df$ind, paste, collapse = " "))



                        You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:



                        newnew2 <- c(new, "five")

                        tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                        df <- stack(setNames(tl, seq_along(tl)))

                        ix <- df$values == "change"
                        df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
                        unname(tapply(df$values, df$ind, paste, collapse = " "))





                        share|improve this answer


























                          up vote
                          1
                          down vote













                          Another approach using strsplit:



                          tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                          df <- stack(setNames(tl, seq_along(tl)))

                          ix <- df$values == "change"
                          df[ix, "values"] <- new
                          tapply(df$values, df$ind, paste, collapse = " ")


                          which gives:




                           1 
                          "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                          2
                          "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                          3
                          "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"



                          Additionally you could wrap the tapply call in unname:



                           unname(tapply(df$values, df$ind, paste, collapse = " "))


                          which gives:




                          [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
                          [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                          [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"




                          If you want to use the elements of new only once, you could update the code to:



                          newnew <- new[1:3]

                          ix <- df$values == "change"
                          df[ix, "values"][1:length(newnew)] <- newnew
                          unname(tapply(df$values, df$ind, paste, collapse = " "))



                          You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:



                          newnew2 <- c(new, "five")

                          tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                          df <- stack(setNames(tl, seq_along(tl)))

                          ix <- df$values == "change"
                          df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
                          unname(tapply(df$values, df$ind, paste, collapse = " "))





                          share|improve this answer
























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Another approach using strsplit:



                            tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                            df <- stack(setNames(tl, seq_along(tl)))

                            ix <- df$values == "change"
                            df[ix, "values"] <- new
                            tapply(df$values, df$ind, paste, collapse = " ")


                            which gives:




                             1 
                            "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                            2
                            "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                            3
                            "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"



                            Additionally you could wrap the tapply call in unname:



                             unname(tapply(df$values, df$ind, paste, collapse = " "))


                            which gives:




                            [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
                            [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                            [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"




                            If you want to use the elements of new only once, you could update the code to:



                            newnew <- new[1:3]

                            ix <- df$values == "change"
                            df[ix, "values"][1:length(newnew)] <- newnew
                            unname(tapply(df$values, df$ind, paste, collapse = " "))



                            You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:



                            newnew2 <- c(new, "five")

                            tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                            df <- stack(setNames(tl, seq_along(tl)))

                            ix <- df$values == "change"
                            df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
                            unname(tapply(df$values, df$ind, paste, collapse = " "))





                            share|improve this answer














                            Another approach using strsplit:



                            tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                            df <- stack(setNames(tl, seq_along(tl)))

                            ix <- df$values == "change"
                            df[ix, "values"] <- new
                            tapply(df$values, df$ind, paste, collapse = " ")


                            which gives:




                             1 
                            "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one"
                            2
                            "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                            3
                            "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"



                            Additionally you could wrap the tapply call in unname:



                             unname(tapply(df$values, df$ind, paste, collapse = " "))


                            which gives:




                            [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
                            [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
                            [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"




                            If you want to use the elements of new only once, you could update the code to:



                            newnew <- new[1:3]

                            ix <- df$values == "change"
                            df[ix, "values"][1:length(newnew)] <- newnew
                            unname(tapply(df$values, df$ind, paste, collapse = " "))



                            You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:



                            newnew2 <- c(new, "five")

                            tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
                            df <- stack(setNames(tl, seq_along(tl)))

                            ix <- df$values == "change"
                            df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
                            unname(tapply(df$values, df$ind, paste, collapse = " "))






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited 22 mins ago

























                            answered 53 mins ago









                            Jaap

                            52.9k20115123




                            52.9k20115123



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52832017%2freplace-words-in-an-unstructured-text-file-using-a-for-loop%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Comments

                                Popular posts from this blog

                                What does second last employer means? [closed]

                                List of Gilmore Girls characters

                                One-line joke