Replace words in an unstructured text file using a for loop

up vote
8
down vote

favorite

I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).

Below I want the manipulated text to include all terms: "one", "two", "three" and "four" once, instead of the "change" strings. However, as you can see sub changes the first pattern in each element, but I need the code to ignore that there are new strings with quotes.

See example code and data below.

 #text to be changed
 text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

 #Variable containing input for text
 new <- c("one", "two", "three", "four")
 #For loop that I want to include 
 for (i in 1:length(new)) 

 text <- sub(pattern = "change", replace = new[i], x = text)

 
 text

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
â€“Â Vladimir Volokhonsky
1 hour ago

add a commentÂ |Â

up vote
8
down vote

favorite

I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).

See example code and data below.

 #text to be changed
 text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

 #Variable containing input for text
 new <- c("one", "two", "three", "four")
 #For loop that I want to include 
 for (i in 1:length(new)) 

 text <- sub(pattern = "change", replace = new[i], x = text)

 
 text

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
â€“Â Vladimir Volokhonsky
1 hour ago

add a commentÂ |Â

up vote
8
down vote

favorite

I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).

See example code and data below.

 #text to be changed
 text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

 #Variable containing input for text
 new <- c("one", "two", "three", "four")
 #For loop that I want to include 
 for (i in 1:length(new)) 

 text <- sub(pattern = "change", replace = new[i], x = text)

 
 text

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

I have a VERY unstructured text file that I read with readLines. I want to change certain strings to another string which is in a variable (called "new" below).

See example code and data below.

 #text to be changed
 text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

 #Variable containing input for text
 new <- c("one", "two", "three", "four")
 #For loop that I want to include 
 for (i in 1:length(new)) 

 text <- sub(pattern = "change", replace = new[i], x = text)

 
 text

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

edited 22 mins ago

Jaap

52.9k20115123

edited 22 mins ago

Jaap

52.9k20115123

edited 22 mins ago

Jaap

52.9k20115123

asked 1 hour ago

Gorp

18619

asked 1 hour ago

Gorp

18619

asked 1 hour ago

Gorp

18619

Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
â€“Â Vladimir Volokhonsky
1 hour ago

add a commentÂ |Â

Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
â€“Â Vladimir Volokhonsky
1 hour ago

Are you need text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one", "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three", "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT") as a result?
â€“Â Vladimir Volokhonsky
1 hour ago

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
6
down vote

How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.

text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text)) 
 while (grepl(pattern = "change", text[i])) 
 text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
 new.i <- new.i + 1
 

text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

add a commentÂ |Â

up vote
1
down vote

Here is another solution using gregexpr() and regmatches():

#text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")

# Alter the structure of text
altered_text <- paste(text, collapse = "n")

# So we can use gregexpr and regmatches to get what you want
matches <- gregexpr("change", altered_text)
regmatches(altered_text, matches) <- list(new)

# And here's the result
cat(altered_text)
#> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
#> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
#> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

# Or, putting the text back to its old structure
# (one element for each line)
unlist(strsplit(altered_text, "n"))
#> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
#> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
#> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

^{Created on 2018-10-16 by the reprex package (v0.2.1)}

We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):

regexpr returns an integer vector of the same length as text giving
the starting position of the first match....

gregexpr returns a list of the same length as text each element of
which is of the same form as the return value for regexpr, except that
the starting positions of every (disjoint) match are given.

(emphasis added).

Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):

Usage

regmatches(x, m, invert = FALSE)

regmatches(x, m, invert = FALSE) <- value

...

value

an object with suitable replacement values for the matched or
non-matched substrings (see Details).

...

Details

The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert is FALSE,
value should be a character vector with length the number of matched
elements in m. Otherwise, it should be a list of character vectors
with the same length as m, each as long as the number of replacements
needed.

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

add a commentÂ |Â

up vote
1
down vote

Another approach using strsplit:

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"] <- new
tapply(df$values, df$ind, paste, collapse = " ")

which gives:

 1 
 "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
 2 
"TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three" 
 3 
 "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

Additionally you could wrap the tapply call in unname:

 unname(tapply(df$values, df$ind, paste, collapse = " "))

which gives:

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

If you want to use the elements of new only once, you could update the code to:

newnew <- new[1:3]

ix <- df$values == "change"
df[ix, "values"][1:length(newnew)] <- newnew
unname(tapply(df$values, df$ind, paste, collapse = " "))

You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:

newnew2 <- c(new, "five")

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
unname(tapply(df$values, df$ind, paste, collapse = " "))

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52832017%2freplace-words-in-an-unstructured-text-file-using-a-for-loop%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
6
down vote

How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.

text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text)) 
 while (grepl(pattern = "change", text[i])) 
 text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
 new.i <- new.i + 1
 

text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

add a commentÂ |Â

up vote
6
down vote

How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.

text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text)) 
 while (grepl(pattern = "change", text[i])) 
 text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
 new.i <- new.i + 1
 

text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

add a commentÂ |Â

up vote
6
down vote

How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.

text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text)) 
 while (grepl(pattern = "change", text[i])) 
 text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
 new.i <- new.i + 1
 

text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

How about this? The logic is, hammer away a string until it has no more change. On every "hit" (where change is found), move along the new vector.

text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change", 
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")
new.i <- 1

for (i in 1:length(text)) 
 while (grepl(pattern = "change", text[i])) 
 text[i] <- sub(pattern = "change", replacement = new[new.i], x = text[i])
 new.i <- new.i + 1
 

text

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

answered 1 hour ago

Roman LuÃ…Â¡trik

48.2k17103158

add a commentÂ |Â

up vote
1
down vote

Here is another solution using gregexpr() and regmatches():

#text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")

# Alter the structure of text
altered_text <- paste(text, collapse = "n")

# So we can use gregexpr and regmatches to get what you want
matches <- gregexpr("change", altered_text)
regmatches(altered_text, matches) <- list(new)

# And here's the result
cat(altered_text)
#> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
#> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
#> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

# Or, putting the text back to its old structure
# (one element for each line)
unlist(strsplit(altered_text, "n"))
#> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
#> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
#> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

^{Created on 2018-10-16 by the reprex package (v0.2.1)}

We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):

regexpr returns an integer vector of the same length as text giving
the starting position of the first match....

gregexpr returns a list of the same length as text each element of
which is of the same form as the return value for regexpr, except that
the starting positions of every (disjoint) match are given.

(emphasis added).

Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):

Usage

regmatches(x, m, invert = FALSE)

regmatches(x, m, invert = FALSE) <- value

...

value

an object with suitable replacement values for the matched or
non-matched substrings (see Details).

...

Details

The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert is FALSE,
value should be a character vector with length the number of matched
elements in m. Otherwise, it should be a list of character vectors
with the same length as m, each as long as the number of replacements
needed.

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

add a commentÂ |Â

up vote
1
down vote

Here is another solution using gregexpr() and regmatches():

#text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")

# Alter the structure of text
altered_text <- paste(text, collapse = "n")

# So we can use gregexpr and regmatches to get what you want
matches <- gregexpr("change", altered_text)
regmatches(altered_text, matches) <- list(new)

# And here's the result
cat(altered_text)
#> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
#> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
#> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

# Or, putting the text back to its old structure
# (one element for each line)
unlist(strsplit(altered_text, "n"))
#> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
#> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
#> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

^{Created on 2018-10-16 by the reprex package (v0.2.1)}

We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):

regexpr returns an integer vector of the same length as text giving
the starting position of the first match....

gregexpr returns a list of the same length as text each element of
which is of the same form as the return value for regexpr, except that
the starting positions of every (disjoint) match are given.

(emphasis added).

Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):

Usage

regmatches(x, m, invert = FALSE)

regmatches(x, m, invert = FALSE) <- value

...

value

an object with suitable replacement values for the matched or
non-matched substrings (see Details).

...

Details

The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert is FALSE,
value should be a character vector with length the number of matched
elements in m. Otherwise, it should be a list of character vectors
with the same length as m, each as long as the number of replacements
needed.

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

add a commentÂ |Â

up vote
1
down vote

Here is another solution using gregexpr() and regmatches():

#text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")

# Alter the structure of text
altered_text <- paste(text, collapse = "n")

# So we can use gregexpr and regmatches to get what you want
matches <- gregexpr("change", altered_text)
regmatches(altered_text, matches) <- list(new)

# And here's the result
cat(altered_text)
#> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
#> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
#> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

# Or, putting the text back to its old structure
# (one element for each line)
unlist(strsplit(altered_text, "n"))
#> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
#> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
#> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

^{Created on 2018-10-16 by the reprex package (v0.2.1)}

We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):

regexpr returns an integer vector of the same length as text giving
the starting position of the first match....

gregexpr returns a list of the same length as text each element of
which is of the same form as the return value for regexpr, except that
the starting positions of every (disjoint) match are given.

(emphasis added).

Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):

Usage

regmatches(x, m, invert = FALSE)

regmatches(x, m, invert = FALSE) <- value

...

value

an object with suitable replacement values for the matched or
non-matched substrings (see Details).

...

Details

The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert is FALSE,
value should be a character vector with length the number of matched
elements in m. Otherwise, it should be a list of character vectors
with the same length as m, each as long as the number of replacements
needed.

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

Here is another solution using gregexpr() and regmatches():

#text to be changed
text <- c("TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT TEXT change",
 "TEXT TEXT TEXT change TEXT TEXT TEXT TEXT")

#Variable containing input for text
new <- c("one", "two", "three", "four")

# Alter the structure of text
altered_text <- paste(text, collapse = "n")

# So we can use gregexpr and regmatches to get what you want
matches <- gregexpr("change", altered_text)
regmatches(altered_text, matches) <- list(new)

# And here's the result
cat(altered_text)
#> TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one
#> TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three
#> TEXT TEXT TEXT four TEXT TEXT TEXT TEXT

# Or, putting the text back to its old structure
# (one element for each line)
unlist(strsplit(altered_text, "n"))
#> [1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
#> [2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
#> [3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

^{Created on 2018-10-16 by the reprex package (v0.2.1)}

We can do this since gregexpr() can find all the matches in the text for "change"; from help("gregexpr"):

regexpr returns an integer vector of the same length as text giving
the starting position of the first match....

gregexpr returns a list of the same length as text each element of
which is of the same form as the return value for regexpr, except that
the starting positions of every (disjoint) match are given.

(emphasis added).

Then regmatches() can be used to either extract the matches found by gregexpr() or replace them; from help("regmatches"):

Usage

regmatches(x, m, invert = FALSE)

regmatches(x, m, invert = FALSE) <- value

...

value

an object with suitable replacement values for the matched or
non-matched substrings (see Details).

...

Details

The replacement function can be used for replacing the matched or
non-matched substrings. For vector match data, if invert is FALSE,
value should be a character vector with length the number of matched
elements in m. Otherwise, it should be a list of character vectors
with the same length as m, each as long as the number of replacements
needed.

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

edited 50 mins ago

answered 59 mins ago

duckmayr

5,33911124

answered 59 mins ago

duckmayr

5,33911124

answered 59 mins ago

duckmayr

5,33911124

add a commentÂ |Â

up vote
1
down vote

Another approach using strsplit:

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"] <- new
tapply(df$values, df$ind, paste, collapse = " ")

which gives:

 1 
 "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
 2 
"TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three" 
 3 
 "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

Additionally you could wrap the tapply call in unname:

 unname(tapply(df$values, df$ind, paste, collapse = " "))

which gives:

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

If you want to use the elements of new only once, you could update the code to:

newnew <- new[1:3]

ix <- df$values == "change"
df[ix, "values"][1:length(newnew)] <- newnew
unname(tapply(df$values, df$ind, paste, collapse = " "))

You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:

newnew2 <- c(new, "five")

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
unname(tapply(df$values, df$ind, paste, collapse = " "))

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

add a commentÂ |Â

up vote
1
down vote

Another approach using strsplit:

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"] <- new
tapply(df$values, df$ind, paste, collapse = " ")

which gives:

 1 
 "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
 2 
"TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three" 
 3 
 "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

Additionally you could wrap the tapply call in unname:

 unname(tapply(df$values, df$ind, paste, collapse = " "))

which gives:

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

If you want to use the elements of new only once, you could update the code to:

newnew <- new[1:3]

ix <- df$values == "change"
df[ix, "values"][1:length(newnew)] <- newnew
unname(tapply(df$values, df$ind, paste, collapse = " "))

You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:

newnew2 <- c(new, "five")

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
unname(tapply(df$values, df$ind, paste, collapse = " "))

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

add a commentÂ |Â

up vote
1
down vote

Another approach using strsplit:

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"] <- new
tapply(df$values, df$ind, paste, collapse = " ")

which gives:

 1 
 "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
 2 
"TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three" 
 3 
 "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

Additionally you could wrap the tapply call in unname:

 unname(tapply(df$values, df$ind, paste, collapse = " "))

which gives:

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

If you want to use the elements of new only once, you could update the code to:

newnew <- new[1:3]

ix <- df$values == "change"
df[ix, "values"][1:length(newnew)] <- newnew
unname(tapply(df$values, df$ind, paste, collapse = " "))

You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:

newnew2 <- c(new, "five")

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
unname(tapply(df$values, df$ind, paste, collapse = " "))

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

Another approach using strsplit:

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"] <- new
tapply(df$values, df$ind, paste, collapse = " ")

which gives:

 1 
 "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
 2 
"TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three" 
 3 
 "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

Additionally you could wrap the tapply call in unname:

 unname(tapply(df$values, df$ind, paste, collapse = " "))

which gives:

[1] "TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT one" 
[2] "TEXT TEXT TEXT two TEXT TEXT TEXT TEXT TEXT three"
[3] "TEXT TEXT TEXT four TEXT TEXT TEXT TEXT"

If you want to use the elements of new only once, you could update the code to:

newnew <- new[1:3]

ix <- df$values == "change"
df[ix, "values"][1:length(newnew)] <- newnew
unname(tapply(df$values, df$ind, paste, collapse = " "))

You could alter this further to also take into account the situation where there are more replacements than positions (occurences of the pattern, change in the example) that need to be replaced:

newnew2 <- c(new, "five")

tl <- lapply(text, function(s) strsplit(s, split = " ")[[1]])
df <- stack(setNames(tl, seq_along(tl)))

ix <- df$values == "change"
df[ix, "values"][1:pmin(sum(ix),length(newnew2))] <- newnew2[1:pmin(sum(ix),length(newnew2))]
unname(tapply(df$values, df$ind, paste, collapse = " "))

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

edited 22 mins ago

answered 53 mins ago

Jaap

52.9k20115123

answered 53 mins ago

Jaap

52.9k20115123

answered 53 mins ago

Jaap

52.9k20115123

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky