awk: extract string from a field [on hold]
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
in the input fields are separated by pipe sign:
CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba
I want to get output where last field is changed (extracted only what is between first = and : in this field
expected output is:
CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8
shell-script awk gawk
put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |Â
up vote
2
down vote
favorite
in the input fields are separated by pipe sign:
CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba
I want to get output where last field is changed (extracted only what is between first = and : in this field
expected output is:
CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8
shell-script awk gawk
put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second|
, plus the part between=
and:
? Please edit your question to add this information.
â Sparhawk
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
in the input fields are separated by pipe sign:
CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba
I want to get output where last field is changed (extracted only what is between first = and : in this field
expected output is:
CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8
shell-script awk gawk
in the input fields are separated by pipe sign:
CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba
I want to get output where last field is changed (extracted only what is between first = and : in this field
expected output is:
CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8
shell-script awk gawk
shell-script awk gawk
edited 2 days ago
asked 2 days ago
Chris
947
947
put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second|
, plus the part between=
and:
? Please edit your question to add this information.
â Sparhawk
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago
add a comment |Â
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second|
, plus the part between=
and:
? Please edit your question to add this information.
â Sparhawk
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second
|
, plus the part between =
and :
? Please edit your question to add this information.â Sparhawk
2 days ago
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second
|
, plus the part between =
and :
? Please edit your question to add this information.â Sparhawk
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
6
down vote
standard awk
is not very good at extracting data out of fields based on patterns. Some options include:
split()
to split the text into an array based on specified delimiters.match()
which sets theRSTART
andRLENGTH
variables to indicate where the match occurred, and then usesubtr()
to extract the matched portion.
So here:
awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt
So returns the portion between the first and second occurrence of a =
or :
in $3
.
Or:
awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt
GNU awk
has a gensub()
extension which brings the functionality of sed
's s
command into awk
:
gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt
Looks for =
followed by any number of non-:
s and extracts the part after =
. The problem with gensub()
is that you can't easily tell if the substitution was successful or not, hence the check that $3
contains =
first.
With sed
:
sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt
With perl
:
perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt
Damn, you were faster. I tried with gawk:awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.
â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of$3 = gensub(...
as the condition was wrong.
â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
add a comment |Â
up vote
4
down vote
I would try
awk -F| 'BEGIN OFS="
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se
where
-F|
tell awk to use|
as input separatorequ=index($3,"=");
get index of = in third field$3=substr($3,equ+1,col-equ-1);
do actual substitution
add a comment |Â
up vote
0
down vote
The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.
awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
standard awk
is not very good at extracting data out of fields based on patterns. Some options include:
split()
to split the text into an array based on specified delimiters.match()
which sets theRSTART
andRLENGTH
variables to indicate where the match occurred, and then usesubtr()
to extract the matched portion.
So here:
awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt
So returns the portion between the first and second occurrence of a =
or :
in $3
.
Or:
awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt
GNU awk
has a gensub()
extension which brings the functionality of sed
's s
command into awk
:
gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt
Looks for =
followed by any number of non-:
s and extracts the part after =
. The problem with gensub()
is that you can't easily tell if the substitution was successful or not, hence the check that $3
contains =
first.
With sed
:
sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt
With perl
:
perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt
Damn, you were faster. I tried with gawk:awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.
â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of$3 = gensub(...
as the condition was wrong.
â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
add a comment |Â
up vote
6
down vote
standard awk
is not very good at extracting data out of fields based on patterns. Some options include:
split()
to split the text into an array based on specified delimiters.match()
which sets theRSTART
andRLENGTH
variables to indicate where the match occurred, and then usesubtr()
to extract the matched portion.
So here:
awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt
So returns the portion between the first and second occurrence of a =
or :
in $3
.
Or:
awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt
GNU awk
has a gensub()
extension which brings the functionality of sed
's s
command into awk
:
gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt
Looks for =
followed by any number of non-:
s and extracts the part after =
. The problem with gensub()
is that you can't easily tell if the substitution was successful or not, hence the check that $3
contains =
first.
With sed
:
sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt
With perl
:
perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt
Damn, you were faster. I tried with gawk:awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.
â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of$3 = gensub(...
as the condition was wrong.
â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
add a comment |Â
up vote
6
down vote
up vote
6
down vote
standard awk
is not very good at extracting data out of fields based on patterns. Some options include:
split()
to split the text into an array based on specified delimiters.match()
which sets theRSTART
andRLENGTH
variables to indicate where the match occurred, and then usesubtr()
to extract the matched portion.
So here:
awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt
So returns the portion between the first and second occurrence of a =
or :
in $3
.
Or:
awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt
GNU awk
has a gensub()
extension which brings the functionality of sed
's s
command into awk
:
gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt
Looks for =
followed by any number of non-:
s and extracts the part after =
. The problem with gensub()
is that you can't easily tell if the substitution was successful or not, hence the check that $3
contains =
first.
With sed
:
sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt
With perl
:
perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt
standard awk
is not very good at extracting data out of fields based on patterns. Some options include:
split()
to split the text into an array based on specified delimiters.match()
which sets theRSTART
andRLENGTH
variables to indicate where the match occurred, and then usesubtr()
to extract the matched portion.
So here:
awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt
So returns the portion between the first and second occurrence of a =
or :
in $3
.
Or:
awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt
GNU awk
has a gensub()
extension which brings the functionality of sed
's s
command into awk
:
gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt
Looks for =
followed by any number of non-:
s and extracts the part after =
. The problem with gensub()
is that you can't easily tell if the substitution was successful or not, hence the check that $3
contains =
first.
With sed
:
sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt
With perl
:
perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt
edited 2 days ago
answered 2 days ago
Stéphane Chazelas
283k53522859
283k53522859
Damn, you were faster. I tried with gawk:awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.
â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of$3 = gensub(...
as the condition was wrong.
â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
add a comment |Â
Damn, you were faster. I tried with gawk:awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.
â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of$3 = gensub(...
as the condition was wrong.
â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
Damn, you were faster. I tried with gawk:
awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.â rexkogitans
2 days ago
Damn, you were faster. I tried with gawk:
awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt
which is pretty much the same as your suggestion.â rexkogitans
2 days ago
@rexkogitans, thanks. made me realise that my using of
$3 = gensub(...
as the condition was wrong.â Stéphane Chazelas
2 days ago
@rexkogitans, thanks. made me realise that my using of
$3 = gensub(...
as the condition was wrong.â Stéphane Chazelas
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â rexkogitans
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â Stéphane Chazelas
2 days ago
add a comment |Â
up vote
4
down vote
I would try
awk -F| 'BEGIN OFS="
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se
where
-F|
tell awk to use|
as input separatorequ=index($3,"=");
get index of = in third field$3=substr($3,equ+1,col-equ-1);
do actual substitution
add a comment |Â
up vote
4
down vote
I would try
awk -F| 'BEGIN OFS="
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se
where
-F|
tell awk to use|
as input separatorequ=index($3,"=");
get index of = in third field$3=substr($3,equ+1,col-equ-1);
do actual substitution
add a comment |Â
up vote
4
down vote
up vote
4
down vote
I would try
awk -F| 'BEGIN OFS="
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se
where
-F|
tell awk to use|
as input separatorequ=index($3,"=");
get index of = in third field$3=substr($3,equ+1,col-equ-1);
do actual substitution
I would try
awk -F| 'BEGIN OFS="
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se
where
-F|
tell awk to use|
as input separatorequ=index($3,"=");
get index of = in third field$3=substr($3,equ+1,col-equ-1);
do actual substitution
answered 2 days ago
Archemar
19.1k93366
19.1k93366
add a comment |Â
add a comment |Â
up vote
0
down vote
The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.
awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file
add a comment |Â
up vote
0
down vote
The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.
awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.
awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file
The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.
awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file
answered 2 days ago
Claes Wikner
11713
11713
add a comment |Â
add a comment |Â
What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second
|
, plus the part between=
and:
? Please edit your question to add this information.â Sparhawk
2 days ago
output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â Chris
2 days ago