Delete identical files saved as output in a log
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
Now the Question seems to have an error, I think. But wasn't possible to write other than it.
Following to my question, I have three files in two folders, namely, a.txt
, b.txt
and c.txt
in A
and B
folder. I have used an app Full File Mini Comparer which compares the folder and saves to log to the A
folder.
The log has some text as follow :-
Different: A=/sdcard/A/a.txt B=/sdcard/B/a.txt
Same: A=/sdcard/A/b.txt B=/sdcard/B/b.txt
Different: A=/sdcard/A/c.txt B=/sdcard/B/c.txt
Now how can I use sed
and rm
or perhaps some other command to remove/delete the "Same" files permanently.
sed rm
add a comment |Â
up vote
2
down vote
favorite
Now the Question seems to have an error, I think. But wasn't possible to write other than it.
Following to my question, I have three files in two folders, namely, a.txt
, b.txt
and c.txt
in A
and B
folder. I have used an app Full File Mini Comparer which compares the folder and saves to log to the A
folder.
The log has some text as follow :-
Different: A=/sdcard/A/a.txt B=/sdcard/B/a.txt
Same: A=/sdcard/A/b.txt B=/sdcard/B/b.txt
Different: A=/sdcard/A/c.txt B=/sdcard/B/c.txt
Now how can I use sed
and rm
or perhaps some other command to remove/delete the "Same" files permanently.
sed rm
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Now the Question seems to have an error, I think. But wasn't possible to write other than it.
Following to my question, I have three files in two folders, namely, a.txt
, b.txt
and c.txt
in A
and B
folder. I have used an app Full File Mini Comparer which compares the folder and saves to log to the A
folder.
The log has some text as follow :-
Different: A=/sdcard/A/a.txt B=/sdcard/B/a.txt
Same: A=/sdcard/A/b.txt B=/sdcard/B/b.txt
Different: A=/sdcard/A/c.txt B=/sdcard/B/c.txt
Now how can I use sed
and rm
or perhaps some other command to remove/delete the "Same" files permanently.
sed rm
Now the Question seems to have an error, I think. But wasn't possible to write other than it.
Following to my question, I have three files in two folders, namely, a.txt
, b.txt
and c.txt
in A
and B
folder. I have used an app Full File Mini Comparer which compares the folder and saves to log to the A
folder.
The log has some text as follow :-
Different: A=/sdcard/A/a.txt B=/sdcard/B/a.txt
Same: A=/sdcard/A/b.txt B=/sdcard/B/b.txt
Different: A=/sdcard/A/c.txt B=/sdcard/B/c.txt
Now how can I use sed
and rm
or perhaps some other command to remove/delete the "Same" files permanently.
sed rm
sed rm
edited 1 hour ago
Kusalananda
109k14212334
109k14212334
asked 2 hours ago
PJ547
192
192
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
You have
$ tree
.
|-- A
| |-- a.txt
| |-- b.txt
| `-- c.txt
`-- B
|-- a.txt
|-- b.txt
`-- c.txt
2 directories, 6 files
Using fdupes
:
$ fdupes -1 A B
A/b.txt B/b.txt
fdupes
detects duplicates based on file contents. The -1
flag makes it output the filenames of each set of duplicates on a single line. Here, it detects that the b.txt
files are identical.
You may use fdupes
to delete duplicates:
$ fdupes --delete A B
[1] A/b.txt
[2] B/b.txt
Set 1 of 1, preserve files [1 - 2, all]: 1
[+] A/b.txt
[-] B/b.txt
It interactively asks which file to keep (or to keep both). I wrote 1
so the A/b.txt
file was kept while B/b.txt
was deleted.
See the manual for fdupes
(man fdupes
). If it's not installed on your system, then use a package manager to install it. It can also be made to automatically delete files without interactive prompting, but care must be taken when running it in this way. Always make a backup of your data before running a command that may delete files.
The reason I suggest using fdupes
rather than parsing the log file that you have is that filenames embedded in a text document are difficult to parse correctly. It may not always be difficult (and is this particular example, it would be easy), but note that Unix allows for both spaces and newlines in the names of files and directories. It is technically possible to have a directory called
a.txt
Same: A=
with a newline embedded in the name.
add a comment |Â
up vote
2
down vote
You can use this linear:
awk -F'[:]' '/Same:/print $0' log | xargs -n1 | awk -F'=' 'print $2' | xargs rm -rf
awk
looks for the line in the log file that contains the keyword "Same:", then xargs
organize the variables and paths (i.e. A=***) one per line, after then awk
captures the absolute path. In the final step, xargs
calls for rm
to delete the paths.
Notice, that when xargs
calls for rm
to delete the paths, this will delete the file definitely. the flag -I
can be added to rm
to ask the user to confirm deleting files.
rm man
-I prompt once before removing more than three files, or
when removing recursively; less intrusive than -i,
while still giving protection against most mistakes
This would fail for files containing:
or=
in their names.
â Kusalananda
43 mins ago
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having:
and=
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out whatxargs
do with space-delimited data (I tend to not usexargs
).
â Kusalananda
36 mins ago
Callingrm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like/home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.
â Malte Skoruppa
29 mins ago
add a comment |Â
up vote
2
down vote
Do you REALLY want to delete all identical files, or just n-1 and keep one copy? Then, why not
awk '/Same:/ for (i=2; i<=NF; i++) split ($i, T, "="); print "rm", T[2]' log
rm /sdcard/A/b.txt
rm /sdcard/B/b.txt
and pipe into sh
when happy with the result.
If you want too keep one copy, start the loop from i=3
.
Or, a different approach without awk
:
echo rm $(md5sum path/to/files* | sort | uniq -Dw33 | cut -d" " -f3-)
rm file2 file4
Should files have spaces in their names, additional steps needed to be taken.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
You have
$ tree
.
|-- A
| |-- a.txt
| |-- b.txt
| `-- c.txt
`-- B
|-- a.txt
|-- b.txt
`-- c.txt
2 directories, 6 files
Using fdupes
:
$ fdupes -1 A B
A/b.txt B/b.txt
fdupes
detects duplicates based on file contents. The -1
flag makes it output the filenames of each set of duplicates on a single line. Here, it detects that the b.txt
files are identical.
You may use fdupes
to delete duplicates:
$ fdupes --delete A B
[1] A/b.txt
[2] B/b.txt
Set 1 of 1, preserve files [1 - 2, all]: 1
[+] A/b.txt
[-] B/b.txt
It interactively asks which file to keep (or to keep both). I wrote 1
so the A/b.txt
file was kept while B/b.txt
was deleted.
See the manual for fdupes
(man fdupes
). If it's not installed on your system, then use a package manager to install it. It can also be made to automatically delete files without interactive prompting, but care must be taken when running it in this way. Always make a backup of your data before running a command that may delete files.
The reason I suggest using fdupes
rather than parsing the log file that you have is that filenames embedded in a text document are difficult to parse correctly. It may not always be difficult (and is this particular example, it would be easy), but note that Unix allows for both spaces and newlines in the names of files and directories. It is technically possible to have a directory called
a.txt
Same: A=
with a newline embedded in the name.
add a comment |Â
up vote
3
down vote
You have
$ tree
.
|-- A
| |-- a.txt
| |-- b.txt
| `-- c.txt
`-- B
|-- a.txt
|-- b.txt
`-- c.txt
2 directories, 6 files
Using fdupes
:
$ fdupes -1 A B
A/b.txt B/b.txt
fdupes
detects duplicates based on file contents. The -1
flag makes it output the filenames of each set of duplicates on a single line. Here, it detects that the b.txt
files are identical.
You may use fdupes
to delete duplicates:
$ fdupes --delete A B
[1] A/b.txt
[2] B/b.txt
Set 1 of 1, preserve files [1 - 2, all]: 1
[+] A/b.txt
[-] B/b.txt
It interactively asks which file to keep (or to keep both). I wrote 1
so the A/b.txt
file was kept while B/b.txt
was deleted.
See the manual for fdupes
(man fdupes
). If it's not installed on your system, then use a package manager to install it. It can also be made to automatically delete files without interactive prompting, but care must be taken when running it in this way. Always make a backup of your data before running a command that may delete files.
The reason I suggest using fdupes
rather than parsing the log file that you have is that filenames embedded in a text document are difficult to parse correctly. It may not always be difficult (and is this particular example, it would be easy), but note that Unix allows for both spaces and newlines in the names of files and directories. It is technically possible to have a directory called
a.txt
Same: A=
with a newline embedded in the name.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
You have
$ tree
.
|-- A
| |-- a.txt
| |-- b.txt
| `-- c.txt
`-- B
|-- a.txt
|-- b.txt
`-- c.txt
2 directories, 6 files
Using fdupes
:
$ fdupes -1 A B
A/b.txt B/b.txt
fdupes
detects duplicates based on file contents. The -1
flag makes it output the filenames of each set of duplicates on a single line. Here, it detects that the b.txt
files are identical.
You may use fdupes
to delete duplicates:
$ fdupes --delete A B
[1] A/b.txt
[2] B/b.txt
Set 1 of 1, preserve files [1 - 2, all]: 1
[+] A/b.txt
[-] B/b.txt
It interactively asks which file to keep (or to keep both). I wrote 1
so the A/b.txt
file was kept while B/b.txt
was deleted.
See the manual for fdupes
(man fdupes
). If it's not installed on your system, then use a package manager to install it. It can also be made to automatically delete files without interactive prompting, but care must be taken when running it in this way. Always make a backup of your data before running a command that may delete files.
The reason I suggest using fdupes
rather than parsing the log file that you have is that filenames embedded in a text document are difficult to parse correctly. It may not always be difficult (and is this particular example, it would be easy), but note that Unix allows for both spaces and newlines in the names of files and directories. It is technically possible to have a directory called
a.txt
Same: A=
with a newline embedded in the name.
You have
$ tree
.
|-- A
| |-- a.txt
| |-- b.txt
| `-- c.txt
`-- B
|-- a.txt
|-- b.txt
`-- c.txt
2 directories, 6 files
Using fdupes
:
$ fdupes -1 A B
A/b.txt B/b.txt
fdupes
detects duplicates based on file contents. The -1
flag makes it output the filenames of each set of duplicates on a single line. Here, it detects that the b.txt
files are identical.
You may use fdupes
to delete duplicates:
$ fdupes --delete A B
[1] A/b.txt
[2] B/b.txt
Set 1 of 1, preserve files [1 - 2, all]: 1
[+] A/b.txt
[-] B/b.txt
It interactively asks which file to keep (or to keep both). I wrote 1
so the A/b.txt
file was kept while B/b.txt
was deleted.
See the manual for fdupes
(man fdupes
). If it's not installed on your system, then use a package manager to install it. It can also be made to automatically delete files without interactive prompting, but care must be taken when running it in this way. Always make a backup of your data before running a command that may delete files.
The reason I suggest using fdupes
rather than parsing the log file that you have is that filenames embedded in a text document are difficult to parse correctly. It may not always be difficult (and is this particular example, it would be easy), but note that Unix allows for both spaces and newlines in the names of files and directories. It is technically possible to have a directory called
a.txt
Same: A=
with a newline embedded in the name.
edited 1 hour ago
answered 1 hour ago
Kusalananda
109k14212334
109k14212334
add a comment |Â
add a comment |Â
up vote
2
down vote
You can use this linear:
awk -F'[:]' '/Same:/print $0' log | xargs -n1 | awk -F'=' 'print $2' | xargs rm -rf
awk
looks for the line in the log file that contains the keyword "Same:", then xargs
organize the variables and paths (i.e. A=***) one per line, after then awk
captures the absolute path. In the final step, xargs
calls for rm
to delete the paths.
Notice, that when xargs
calls for rm
to delete the paths, this will delete the file definitely. the flag -I
can be added to rm
to ask the user to confirm deleting files.
rm man
-I prompt once before removing more than three files, or
when removing recursively; less intrusive than -i,
while still giving protection against most mistakes
This would fail for files containing:
or=
in their names.
â Kusalananda
43 mins ago
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having:
and=
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out whatxargs
do with space-delimited data (I tend to not usexargs
).
â Kusalananda
36 mins ago
Callingrm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like/home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.
â Malte Skoruppa
29 mins ago
add a comment |Â
up vote
2
down vote
You can use this linear:
awk -F'[:]' '/Same:/print $0' log | xargs -n1 | awk -F'=' 'print $2' | xargs rm -rf
awk
looks for the line in the log file that contains the keyword "Same:", then xargs
organize the variables and paths (i.e. A=***) one per line, after then awk
captures the absolute path. In the final step, xargs
calls for rm
to delete the paths.
Notice, that when xargs
calls for rm
to delete the paths, this will delete the file definitely. the flag -I
can be added to rm
to ask the user to confirm deleting files.
rm man
-I prompt once before removing more than three files, or
when removing recursively; less intrusive than -i,
while still giving protection against most mistakes
This would fail for files containing:
or=
in their names.
â Kusalananda
43 mins ago
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having:
and=
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out whatxargs
do with space-delimited data (I tend to not usexargs
).
â Kusalananda
36 mins ago
Callingrm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like/home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.
â Malte Skoruppa
29 mins ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
You can use this linear:
awk -F'[:]' '/Same:/print $0' log | xargs -n1 | awk -F'=' 'print $2' | xargs rm -rf
awk
looks for the line in the log file that contains the keyword "Same:", then xargs
organize the variables and paths (i.e. A=***) one per line, after then awk
captures the absolute path. In the final step, xargs
calls for rm
to delete the paths.
Notice, that when xargs
calls for rm
to delete the paths, this will delete the file definitely. the flag -I
can be added to rm
to ask the user to confirm deleting files.
rm man
-I prompt once before removing more than three files, or
when removing recursively; less intrusive than -i,
while still giving protection against most mistakes
You can use this linear:
awk -F'[:]' '/Same:/print $0' log | xargs -n1 | awk -F'=' 'print $2' | xargs rm -rf
awk
looks for the line in the log file that contains the keyword "Same:", then xargs
organize the variables and paths (i.e. A=***) one per line, after then awk
captures the absolute path. In the final step, xargs
calls for rm
to delete the paths.
Notice, that when xargs
calls for rm
to delete the paths, this will delete the file definitely. the flag -I
can be added to rm
to ask the user to confirm deleting files.
rm man
-I prompt once before removing more than three files, or
when removing recursively; less intrusive than -i,
while still giving protection against most mistakes
edited 27 mins ago
answered 51 mins ago
Goro
8,70354384
8,70354384
This would fail for files containing:
or=
in their names.
â Kusalananda
43 mins ago
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having:
and=
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out whatxargs
do with space-delimited data (I tend to not usexargs
).
â Kusalananda
36 mins ago
Callingrm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like/home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.
â Malte Skoruppa
29 mins ago
add a comment |Â
This would fail for files containing:
or=
in their names.
â Kusalananda
43 mins ago
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having:
and=
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out whatxargs
do with space-delimited data (I tend to not usexargs
).
â Kusalananda
36 mins ago
Callingrm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like/home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.
â Malte Skoruppa
29 mins ago
This would fail for files containing
:
or =
in their names.â Kusalananda
43 mins ago
This would fail for files containing
:
or =
in their names.â Kusalananda
43 mins ago
1
1
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having
:
and =
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out what xargs
do with space-delimited data (I tend to not use xargs
).â Kusalananda
36 mins ago
I think it's perfectly fine to answer a question with those caveats, as long as they are actually mentioned. Having
:
and =
in filenames is unusual (:
is often part of individual messages in maildir mailboxes though), and it's even more unusual to have newlines in filenames. Spaces are fairly common though, for example on standard macOS systems, and I've never worked out what xargs
do with space-delimited data (I tend to not use xargs
).â Kusalananda
36 mins ago
Calling
rm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like /home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.â Malte Skoruppa
29 mins ago
Calling
rm -rf
on the output of a script that essentially reformats an input file seems dangerous. In fact, this script is vulnerable to path traversal attacks: if a string given as a filename looks like /home/username
for example, this could delete your home directory without any safety check or confirmation. I wouldn't feel safe running this command, even if I had written the input file myself. You can always make mistakes.â Malte Skoruppa
29 mins ago
add a comment |Â
up vote
2
down vote
Do you REALLY want to delete all identical files, or just n-1 and keep one copy? Then, why not
awk '/Same:/ for (i=2; i<=NF; i++) split ($i, T, "="); print "rm", T[2]' log
rm /sdcard/A/b.txt
rm /sdcard/B/b.txt
and pipe into sh
when happy with the result.
If you want too keep one copy, start the loop from i=3
.
Or, a different approach without awk
:
echo rm $(md5sum path/to/files* | sort | uniq -Dw33 | cut -d" " -f3-)
rm file2 file4
Should files have spaces in their names, additional steps needed to be taken.
add a comment |Â
up vote
2
down vote
Do you REALLY want to delete all identical files, or just n-1 and keep one copy? Then, why not
awk '/Same:/ for (i=2; i<=NF; i++) split ($i, T, "="); print "rm", T[2]' log
rm /sdcard/A/b.txt
rm /sdcard/B/b.txt
and pipe into sh
when happy with the result.
If you want too keep one copy, start the loop from i=3
.
Or, a different approach without awk
:
echo rm $(md5sum path/to/files* | sort | uniq -Dw33 | cut -d" " -f3-)
rm file2 file4
Should files have spaces in their names, additional steps needed to be taken.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Do you REALLY want to delete all identical files, or just n-1 and keep one copy? Then, why not
awk '/Same:/ for (i=2; i<=NF; i++) split ($i, T, "="); print "rm", T[2]' log
rm /sdcard/A/b.txt
rm /sdcard/B/b.txt
and pipe into sh
when happy with the result.
If you want too keep one copy, start the loop from i=3
.
Or, a different approach without awk
:
echo rm $(md5sum path/to/files* | sort | uniq -Dw33 | cut -d" " -f3-)
rm file2 file4
Should files have spaces in their names, additional steps needed to be taken.
Do you REALLY want to delete all identical files, or just n-1 and keep one copy? Then, why not
awk '/Same:/ for (i=2; i<=NF; i++) split ($i, T, "="); print "rm", T[2]' log
rm /sdcard/A/b.txt
rm /sdcard/B/b.txt
and pipe into sh
when happy with the result.
If you want too keep one copy, start the loop from i=3
.
Or, a different approach without awk
:
echo rm $(md5sum path/to/files* | sort | uniq -Dw33 | cut -d" " -f3-)
rm file2 file4
Should files have spaces in their names, additional steps needed to be taken.
edited 17 mins ago
answered 29 mins ago
RudiC
1,93219
1,93219
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475087%2fdelete-identical-files-saved-as-output-in-a-log%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password