Find files by pattern and copy to target location
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
7
down vote
favorite
This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as
.jpg
). Copy these files from whatever location they are in to a new folder.
#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
[.][p][d][f] #extension .pdf
""", re.VERBOSE)
def selectiveCopy(fileExtension):
for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
locatedFiles = filePdf.search(pdfFiles)
if locatedFiles != None:
print(locatedFiles)
files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs")
print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)
I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".
python beginner python-3.x regex file-system
add a comment |Â
up vote
7
down vote
favorite
This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as
.jpg
). Copy these files from whatever location they are in to a new folder.
#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
[.][p][d][f] #extension .pdf
""", re.VERBOSE)
def selectiveCopy(fileExtension):
for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
locatedFiles = filePdf.search(pdfFiles)
if locatedFiles != None:
print(locatedFiles)
files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs")
print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)
I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".
python beginner python-3.x regex file-system
1
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as
.jpg
). Copy these files from whatever location they are in to a new folder.
#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
[.][p][d][f] #extension .pdf
""", re.VERBOSE)
def selectiveCopy(fileExtension):
for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
locatedFiles = filePdf.search(pdfFiles)
if locatedFiles != None:
print(locatedFiles)
files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs")
print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)
I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".
python beginner python-3.x regex file-system
This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as
.jpg
). Copy these files from whatever location they are in to a new folder.
#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
[.][p][d][f] #extension .pdf
""", re.VERBOSE)
def selectiveCopy(fileExtension):
for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
locatedFiles = filePdf.search(pdfFiles)
if locatedFiles != None:
print(locatedFiles)
files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs")
print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)
I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".
python beginner python-3.x regex file-system
edited Aug 17 at 16:20


200_success
124k14143401
124k14143401
asked Aug 17 at 12:35
LRBrady
384
384
1
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50
add a comment |Â
1
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50
1
1
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
6
down vote
accepted
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making
it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:get_files_recursively
is a generator
– Graipher
Aug 17 at 13:53
2
A typo here:if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
add a comment |Â
up vote
5
down vote
Review
Your regex can be simplified:
There is no need for the multiple
brackets - this would suffice:
r"^(.*?).pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension):
for filename in glob.iglob(f'source/**/*extension', recursive=True):
yield filenameuse something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
I added a$
to the regex so it wont match files like "afile.pdfsonething.txt"
– Ludisposed
Aug 17 at 13:25
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making
it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:get_files_recursively
is a generator
– Graipher
Aug 17 at 13:53
2
A typo here:if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
add a comment |Â
up vote
6
down vote
accepted
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making
it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:get_files_recursively
is a generator
– Graipher
Aug 17 at 13:53
2
A typo here:if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making
it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making
it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
edited Aug 17 at 18:59
answered Aug 17 at 13:09
Graipher
21.1k43182
21.1k43182
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:get_files_recursively
is a generator
– Graipher
Aug 17 at 13:53
2
A typo here:if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
add a comment |Â
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:get_files_recursively
is a generator
– Graipher
Aug 17 at 13:53
2
A typo here:if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
– LRBrady
Aug 17 at 13:25
@LRBrady The one thing I did not link to in my answer:
get_files_recursively
is a generator– Graipher
Aug 17 at 13:53
@LRBrady The one thing I did not link to in my answer:
get_files_recursively
is a generator– Graipher
Aug 17 at 13:53
2
2
A typo here:
if filter is None
– hjpotter92
Aug 17 at 18:58
A typo here:
if filter is None
– hjpotter92
Aug 17 at 18:58
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
@hjpotter92 fixed
– Graipher
Aug 17 at 18:59
add a comment |Â
up vote
5
down vote
Review
Your regex can be simplified:
There is no need for the multiple
brackets - this would suffice:
r"^(.*?).pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension):
for filename in glob.iglob(f'source/**/*extension', recursive=True):
yield filenameuse something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
I added a$
to the regex so it wont match files like "afile.pdfsonething.txt"
– Ludisposed
Aug 17 at 13:25
add a comment |Â
up vote
5
down vote
Review
Your regex can be simplified:
There is no need for the multiple
brackets - this would suffice:
r"^(.*?).pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension):
for filename in glob.iglob(f'source/**/*extension', recursive=True):
yield filenameuse something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
I added a$
to the regex so it wont match files like "afile.pdfsonething.txt"
– Ludisposed
Aug 17 at 13:25
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Review
Your regex can be simplified:
There is no need for the multiple
brackets - this would suffice:
r"^(.*?).pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension):
for filename in glob.iglob(f'source/**/*extension', recursive=True):
yield filenameuse something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
Review
Your regex can be simplified:
There is no need for the multiple
brackets - this would suffice:
r"^(.*?).pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension):
for filename in glob.iglob(f'source/**/*extension', recursive=True):
yield filenameuse something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
edited Aug 21 at 8:44
answered Aug 17 at 13:12


Ludisposed
6,10621758
6,10621758
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
I added a$
to the regex so it wont match files like "afile.pdfsonething.txt"
– Ludisposed
Aug 17 at 13:25
add a comment |Â
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
I added a$
to the regex so it wont match files like "afile.pdfsonething.txt"
– Ludisposed
Aug 17 at 13:25
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
I tried using the regex r"^(.*?).pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.
– LRBrady
Aug 17 at 13:23
2
2
I added a
$
to the regex so it wont match files like "afile.pdfsonething.txt"– Ludisposed
Aug 17 at 13:25
I added a
$
to the regex so it wont match files like "afile.pdfsonething.txt"– Ludisposed
Aug 17 at 13:25
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f201881%2ffind-files-by-pattern-and-copy-to-target-location%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.
– Graipher
Aug 17 at 12:50