Xargs to extract filename

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I would like to find all the .html files in a folder and append [file](./file.html) to another file called index.md. I tried the following command:



ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'


But it can't substitute @@ inside the command? What am I doing wrong?



Note: Filename can contain valid characters like space




Clarification:



index.md would have each line with [file](./file.html) where file is the actual file name in the folder







share|improve this question






















  • xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
    – weirdan
    Sep 3 at 11:24






  • 1




    Would you please clarify how the content of "index.md" will look like?
    – Goro
    Sep 3 at 11:32











  • @Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
    – Nikhil
    Sep 3 at 11:39










  • @Nikhil. Would you please include it again. Thanks!
    – Goro
    Sep 3 at 11:40











  • @Goro Isn't it appropriate to justify accepted answer?
    – Nikhil
    Sep 3 at 14:37














up vote
3
down vote

favorite












I would like to find all the .html files in a folder and append [file](./file.html) to another file called index.md. I tried the following command:



ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'


But it can't substitute @@ inside the command? What am I doing wrong?



Note: Filename can contain valid characters like space




Clarification:



index.md would have each line with [file](./file.html) where file is the actual file name in the folder







share|improve this question






















  • xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
    – weirdan
    Sep 3 at 11:24






  • 1




    Would you please clarify how the content of "index.md" will look like?
    – Goro
    Sep 3 at 11:32











  • @Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
    – Nikhil
    Sep 3 at 11:39










  • @Nikhil. Would you please include it again. Thanks!
    – Goro
    Sep 3 at 11:40











  • @Goro Isn't it appropriate to justify accepted answer?
    – Nikhil
    Sep 3 at 14:37












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I would like to find all the .html files in a folder and append [file](./file.html) to another file called index.md. I tried the following command:



ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'


But it can't substitute @@ inside the command? What am I doing wrong?



Note: Filename can contain valid characters like space




Clarification:



index.md would have each line with [file](./file.html) where file is the actual file name in the folder







share|improve this question














I would like to find all the .html files in a folder and append [file](./file.html) to another file called index.md. I tried the following command:



ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'


But it can't substitute @@ inside the command? What am I doing wrong?



Note: Filename can contain valid characters like space




Clarification:



index.md would have each line with [file](./file.html) where file is the actual file name in the folder









share|improve this question













share|improve this question




share|improve this question








edited Sep 5 at 2:18









Rui F Ribeiro

36.1k1271114




36.1k1271114










asked Sep 3 at 11:14









Nikhil

1217




1217











  • xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
    – weirdan
    Sep 3 at 11:24






  • 1




    Would you please clarify how the content of "index.md" will look like?
    – Goro
    Sep 3 at 11:32











  • @Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
    – Nikhil
    Sep 3 at 11:39










  • @Nikhil. Would you please include it again. Thanks!
    – Goro
    Sep 3 at 11:40











  • @Goro Isn't it appropriate to justify accepted answer?
    – Nikhil
    Sep 3 at 14:37
















  • xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
    – weirdan
    Sep 3 at 11:24






  • 1




    Would you please clarify how the content of "index.md" will look like?
    – Goro
    Sep 3 at 11:32











  • @Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
    – Nikhil
    Sep 3 at 11:39










  • @Nikhil. Would you please include it again. Thanks!
    – Goro
    Sep 3 at 11:40











  • @Goro Isn't it appropriate to justify accepted answer?
    – Nikhil
    Sep 3 at 14:37















xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
– weirdan
Sep 3 at 11:24




xargs -0 implies null-terminated strings on the xargs stdin, but awk does not print them. $ needs a variable name. Both points are addressed in @RoVo's answer
– weirdan
Sep 3 at 11:24




1




1




Would you please clarify how the content of "index.md" will look like?
– Goro
Sep 3 at 11:32





Would you please clarify how the content of "index.md" will look like?
– Goro
Sep 3 at 11:32













@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
– Nikhil
Sep 3 at 11:39




@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
– Nikhil
Sep 3 at 11:39












@Nikhil. Would you please include it again. Thanks!
– Goro
Sep 3 at 11:40





@Nikhil. Would you please include it again. Thanks!
– Goro
Sep 3 at 11:40













@Goro Isn't it appropriate to justify accepted answer?
– Nikhil
Sep 3 at 14:37




@Goro Isn't it appropriate to justify accepted answer?
– Nikhil
Sep 3 at 14:37










3 Answers
3






active

oldest

votes

















up vote
11
down vote



accepted










Do not parse ls.

You don't need xargs for this, you can use find -exec.



try this,



find . -maxdepth 1 -type f -name "*.html" -exec 
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


If you want to use xargs, use this very similar version:



find . -maxdepth 1 -type f -name "*.html" -print0 | 
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


Another way without running xargs or -exec:



find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n' 
| sed 's/.html]/]/'
> index.md





share|improve this answer






















  • Is that an extra sh argument in the first command, or is that intentional?
    – Toby Speight
    Sep 3 at 15:26






  • 2




    This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
    – RoVo
    Sep 3 at 15:27







  • 1




    Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
    – Toby Speight
    Sep 3 at 15:40






  • 1




    Add '-type f' to avoid strangeness with directories matching "*.html"
    – abligh
    Sep 3 at 17:29










  • thanks, edited.
    – RoVo
    Sep 3 at 19:24

















up vote
16
down vote













Just do:



for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md


Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.



Or with one printf call with zsh:



files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md


Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:



for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md


Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.



Or with perl:



perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'


Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.



There are a few things wrong in your code:



  • parsing the output of ls

  • use a $ meant to be literal inside double quotes

  • Using awk for something that grep can do (not wrong per se, but overkill)

  • use xargs -0 when the input is not NUL-delimited


  • -I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.

  • using inside the code argument of sh (command injection vulnerability)

  • In sh, the var in $var%.* is a variable name, it won't work with arbitrary text.

  • use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:



printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md


  • Replacing ls with printf '%s' * to get a NUL-delimited output


  • awk with grep -z (GNU extension) to process that NUL-delimited output


  • xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible

  • have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.

  • which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the $param%pattern parameter expansion operator.

  • use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.




¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)






share|improve this answer






















  • That overwrites index.md though, which OP's code did not.
    – weirdan
    Sep 3 at 11:27






  • 2




    I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
    – RoVo
    Sep 3 at 11:28











  • @StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
    – Nikhil
    Sep 3 at 12:48







  • 1




    @Nikhil, see edit.
    – Stéphane Chazelas
    Sep 3 at 13:02

















up vote
3
down vote













Do you really need xargs?



ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


(If you have more than 100000 files):



printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


or (slower, but shorter):



for f in *.html; do echo "[$f%.*](./$f)"; done





share|improve this answer






















  • Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
    – Stéphane Chazelas
    Sep 4 at 7:18










  • The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
    – Stéphane Chazelas
    Sep 4 at 7:20










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466550%2fxargs-to-extract-filename%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
11
down vote



accepted










Do not parse ls.

You don't need xargs for this, you can use find -exec.



try this,



find . -maxdepth 1 -type f -name "*.html" -exec 
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


If you want to use xargs, use this very similar version:



find . -maxdepth 1 -type f -name "*.html" -print0 | 
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


Another way without running xargs or -exec:



find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n' 
| sed 's/.html]/]/'
> index.md





share|improve this answer






















  • Is that an extra sh argument in the first command, or is that intentional?
    – Toby Speight
    Sep 3 at 15:26






  • 2




    This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
    – RoVo
    Sep 3 at 15:27







  • 1




    Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
    – Toby Speight
    Sep 3 at 15:40






  • 1




    Add '-type f' to avoid strangeness with directories matching "*.html"
    – abligh
    Sep 3 at 17:29










  • thanks, edited.
    – RoVo
    Sep 3 at 19:24














up vote
11
down vote



accepted










Do not parse ls.

You don't need xargs for this, you can use find -exec.



try this,



find . -maxdepth 1 -type f -name "*.html" -exec 
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


If you want to use xargs, use this very similar version:



find . -maxdepth 1 -type f -name "*.html" -print0 | 
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


Another way without running xargs or -exec:



find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n' 
| sed 's/.html]/]/'
> index.md





share|improve this answer






















  • Is that an extra sh argument in the first command, or is that intentional?
    – Toby Speight
    Sep 3 at 15:26






  • 2




    This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
    – RoVo
    Sep 3 at 15:27







  • 1




    Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
    – Toby Speight
    Sep 3 at 15:40






  • 1




    Add '-type f' to avoid strangeness with directories matching "*.html"
    – abligh
    Sep 3 at 17:29










  • thanks, edited.
    – RoVo
    Sep 3 at 19:24












up vote
11
down vote



accepted







up vote
11
down vote



accepted






Do not parse ls.

You don't need xargs for this, you can use find -exec.



try this,



find . -maxdepth 1 -type f -name "*.html" -exec 
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


If you want to use xargs, use this very similar version:



find . -maxdepth 1 -type f -name "*.html" -print0 | 
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


Another way without running xargs or -exec:



find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n' 
| sed 's/.html]/]/'
> index.md





share|improve this answer














Do not parse ls.

You don't need xargs for this, you can use find -exec.



try this,



find . -maxdepth 1 -type f -name "*.html" -exec 
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


If you want to use xargs, use this very similar version:



find . -maxdepth 1 -type f -name "*.html" -print0 | 
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;


Another way without running xargs or -exec:



find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n' 
| sed 's/.html]/]/'
> index.md






share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 3 at 19:24

























answered Sep 3 at 11:23









RoVo

1,266211




1,266211











  • Is that an extra sh argument in the first command, or is that intentional?
    – Toby Speight
    Sep 3 at 15:26






  • 2




    This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
    – RoVo
    Sep 3 at 15:27







  • 1




    Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
    – Toby Speight
    Sep 3 at 15:40






  • 1




    Add '-type f' to avoid strangeness with directories matching "*.html"
    – abligh
    Sep 3 at 17:29










  • thanks, edited.
    – RoVo
    Sep 3 at 19:24
















  • Is that an extra sh argument in the first command, or is that intentional?
    – Toby Speight
    Sep 3 at 15:26






  • 2




    This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
    – RoVo
    Sep 3 at 15:27







  • 1




    Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
    – Toby Speight
    Sep 3 at 15:40






  • 1




    Add '-type f' to avoid strangeness with directories matching "*.html"
    – abligh
    Sep 3 at 17:29










  • thanks, edited.
    – RoVo
    Sep 3 at 19:24















Is that an extra sh argument in the first command, or is that intentional?
– Toby Speight
Sep 3 at 15:26




Is that an extra sh argument in the first command, or is that intentional?
– Toby Speight
Sep 3 at 15:26




2




2




This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
– RoVo
Sep 3 at 15:27





This is taken from this answer. See comments there and man sh -> -c for a documentation why this is needed.
– RoVo
Sep 3 at 15:27





1




1




Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
– Toby Speight
Sep 3 at 15:40




Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.
– Toby Speight
Sep 3 at 15:40




1




1




Add '-type f' to avoid strangeness with directories matching "*.html"
– abligh
Sep 3 at 17:29




Add '-type f' to avoid strangeness with directories matching "*.html"
– abligh
Sep 3 at 17:29












thanks, edited.
– RoVo
Sep 3 at 19:24




thanks, edited.
– RoVo
Sep 3 at 19:24












up vote
16
down vote













Just do:



for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md


Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.



Or with one printf call with zsh:



files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md


Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:



for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md


Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.



Or with perl:



perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'


Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.



There are a few things wrong in your code:



  • parsing the output of ls

  • use a $ meant to be literal inside double quotes

  • Using awk for something that grep can do (not wrong per se, but overkill)

  • use xargs -0 when the input is not NUL-delimited


  • -I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.

  • using inside the code argument of sh (command injection vulnerability)

  • In sh, the var in $var%.* is a variable name, it won't work with arbitrary text.

  • use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:



printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md


  • Replacing ls with printf '%s' * to get a NUL-delimited output


  • awk with grep -z (GNU extension) to process that NUL-delimited output


  • xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible

  • have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.

  • which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the $param%pattern parameter expansion operator.

  • use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.




¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)






share|improve this answer






















  • That overwrites index.md though, which OP's code did not.
    – weirdan
    Sep 3 at 11:27






  • 2




    I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
    – RoVo
    Sep 3 at 11:28











  • @StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
    – Nikhil
    Sep 3 at 12:48







  • 1




    @Nikhil, see edit.
    – Stéphane Chazelas
    Sep 3 at 13:02














up vote
16
down vote













Just do:



for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md


Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.



Or with one printf call with zsh:



files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md


Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:



for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md


Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.



Or with perl:



perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'


Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.



There are a few things wrong in your code:



  • parsing the output of ls

  • use a $ meant to be literal inside double quotes

  • Using awk for something that grep can do (not wrong per se, but overkill)

  • use xargs -0 when the input is not NUL-delimited


  • -I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.

  • using inside the code argument of sh (command injection vulnerability)

  • In sh, the var in $var%.* is a variable name, it won't work with arbitrary text.

  • use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:



printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md


  • Replacing ls with printf '%s' * to get a NUL-delimited output


  • awk with grep -z (GNU extension) to process that NUL-delimited output


  • xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible

  • have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.

  • which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the $param%pattern parameter expansion operator.

  • use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.




¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)






share|improve this answer






















  • That overwrites index.md though, which OP's code did not.
    – weirdan
    Sep 3 at 11:27






  • 2




    I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
    – RoVo
    Sep 3 at 11:28











  • @StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
    – Nikhil
    Sep 3 at 12:48







  • 1




    @Nikhil, see edit.
    – Stéphane Chazelas
    Sep 3 at 13:02












up vote
16
down vote










up vote
16
down vote









Just do:



for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md


Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.



Or with one printf call with zsh:



files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md


Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:



for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md


Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.



Or with perl:



perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'


Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.



There are a few things wrong in your code:



  • parsing the output of ls

  • use a $ meant to be literal inside double quotes

  • Using awk for something that grep can do (not wrong per se, but overkill)

  • use xargs -0 when the input is not NUL-delimited


  • -I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.

  • using inside the code argument of sh (command injection vulnerability)

  • In sh, the var in $var%.* is a variable name, it won't work with arbitrary text.

  • use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:



printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md


  • Replacing ls with printf '%s' * to get a NUL-delimited output


  • awk with grep -z (GNU extension) to process that NUL-delimited output


  • xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible

  • have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.

  • which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the $param%pattern parameter expansion operator.

  • use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.




¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)






share|improve this answer














Just do:



for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md


Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.



Or with one printf call with zsh:



files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md


Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:



for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md


Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.



Or with perl:



perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'


Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.



There are a few things wrong in your code:



  • parsing the output of ls

  • use a $ meant to be literal inside double quotes

  • Using awk for something that grep can do (not wrong per se, but overkill)

  • use xargs -0 when the input is not NUL-delimited


  • -I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.

  • using inside the code argument of sh (command injection vulnerability)

  • In sh, the var in $var%.* is a variable name, it won't work with arbitrary text.

  • use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:



printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md


  • Replacing ls with printf '%s' * to get a NUL-delimited output


  • awk with grep -z (GNU extension) to process that NUL-delimited output


  • xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible

  • have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.

  • which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the $param%pattern parameter expansion operator.

  • use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.




¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)







share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 4 at 9:40

























answered Sep 3 at 11:24









Stéphane Chazelas

283k53521857




283k53521857











  • That overwrites index.md though, which OP's code did not.
    – weirdan
    Sep 3 at 11:27






  • 2




    I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
    – RoVo
    Sep 3 at 11:28











  • @StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
    – Nikhil
    Sep 3 at 12:48







  • 1




    @Nikhil, see edit.
    – Stéphane Chazelas
    Sep 3 at 13:02
















  • That overwrites index.md though, which OP's code did not.
    – weirdan
    Sep 3 at 11:27






  • 2




    I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
    – RoVo
    Sep 3 at 11:28











  • @StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
    – Nikhil
    Sep 3 at 12:48







  • 1




    @Nikhil, see edit.
    – Stéphane Chazelas
    Sep 3 at 13:02















That overwrites index.md though, which OP's code did not.
– weirdan
Sep 3 at 11:27




That overwrites index.md though, which OP's code did not.
– weirdan
Sep 3 at 11:27




2




2




I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
– RoVo
Sep 3 at 11:28





I think this is still what OP wants. OP uses >> because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
– RoVo
Sep 3 at 11:28













@StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
– Nikhil
Sep 3 at 12:48





@StéphaneChazelas Thanks for the answer. But for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md appends [*](./*.html) when no html file exists.
– Nikhil
Sep 3 at 12:48





1




1




@Nikhil, see edit.
– Stéphane Chazelas
Sep 3 at 13:02




@Nikhil, see edit.
– Stéphane Chazelas
Sep 3 at 13:02










up vote
3
down vote













Do you really need xargs?



ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


(If you have more than 100000 files):



printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


or (slower, but shorter):



for f in *.html; do echo "[$f%.*](./$f)"; done





share|improve this answer






















  • Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
    – Stéphane Chazelas
    Sep 4 at 7:18










  • The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
    – Stéphane Chazelas
    Sep 4 at 7:20














up vote
3
down vote













Do you really need xargs?



ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


(If you have more than 100000 files):



printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


or (slower, but shorter):



for f in *.html; do echo "[$f%.*](./$f)"; done





share|improve this answer






















  • Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
    – Stéphane Chazelas
    Sep 4 at 7:18










  • The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
    – Stéphane Chazelas
    Sep 4 at 7:20












up vote
3
down vote










up vote
3
down vote









Do you really need xargs?



ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


(If you have more than 100000 files):



printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


or (slower, but shorter):



for f in *.html; do echo "[$f%.*](./$f)"; done





share|improve this answer














Do you really need xargs?



ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


(If you have more than 100000 files):



printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'


or (slower, but shorter):



for f in *.html; do echo "[$f%.*](./$f)"; done






share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 4 at 7:10

























answered Sep 3 at 20:46









Ole Tange

11.4k1344102




11.4k1344102











  • Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
    – Stéphane Chazelas
    Sep 4 at 7:18










  • The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
    – Stéphane Chazelas
    Sep 4 at 7:20
















  • Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
    – Stéphane Chazelas
    Sep 4 at 7:18










  • The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
    – Stéphane Chazelas
    Sep 4 at 7:20















Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
– Stéphane Chazelas
Sep 4 at 7:18




Note that with ls *.html, if any of those html files are of type directory, ls will list their content. More generally, when you use ls with a shell wildcard, you want to use ls -d -- *.html (which also addresses the issues with file names starting with -).
– Stéphane Chazelas
Sep 4 at 7:18












The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
– Stéphane Chazelas
Sep 4 at 7:20




The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally, echo can't be used for arbitrary data.
– Stéphane Chazelas
Sep 4 at 7:20

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466550%2fxargs-to-extract-filename%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

Installing NextGIS Connect into QGIS 3?

One-line joke