awk: extract string from a field [on hold]

up vote
2
down vote

favorite

in the input fields are separated by pipe sign:

CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba

I want to get output where last field is changed (extracted only what is between first = and : in this field

expected output is:

CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8

edited 2 days ago

asked 2 days ago

Chris

947

put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itÃ¢Â€Â™s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
â€“Â Sparhawk
2 days ago

output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â€“Â Chris
2 days ago

add a commentÂ |Â

up vote
2
down vote

favorite

in the input fields are separated by pipe sign:

CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba

I want to get output where last field is changed (extracted only what is between first = and : in this field

expected output is:

CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8

edited 2 days ago

asked 2 days ago

Chris

947

put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday

What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
â€“Â Sparhawk
2 days ago

output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â€“Â Chris
2 days ago

add a commentÂ |Â

up vote
2
down vote

favorite

in the input fields are separated by pipe sign:

CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba

I want to get output where last field is changed (extracted only what is between first = and : in this field

expected output is:

CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8

edited 2 days ago

asked 2 days ago

Chris

947

in the input fields are separated by pipe sign:

CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba

I want to get output where last field is changed (extracted only what is between first = and : in this field

expected output is:

CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8

shell-script awk gawk

edited 2 days ago

asked 2 days ago

Chris

947

edited 2 days ago

asked 2 days ago

Chris

947

edited 2 days ago

asked 2 days ago

Chris

947

asked 2 days ago

Chris

947

asked 2 days ago

Chris

947

put on hold as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart yesterday

What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
â€“Â Sparhawk
2 days ago

output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â€“Â Chris
2 days ago

add a commentÂ |Â

What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
â€“Â Sparhawk
2 days ago

output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â€“Â Chris
2 days ago

What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
â€“Â Sparhawk
2 days ago

output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
â€“Â Chris
2 days ago

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
6
down vote

standard awk is not very good at extracting data out of fields based on patterns. Some options include:

split() to split the text into an array based on specified delimiters.

match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:

awk -F'|' -v OFS='|' '
 split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt

So returns the portion between the first and second occurrence of a = or : in $3.

Or:

awk -F'|' -v OFS='|' '
 match($3, /=[^:]*/) 
 print $1, $2, substr($3, RSTART+1, RLENGTH-1)
 ' < file.txt

GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:

gawk -F'|' -v OFS='|' '
 $3 ~ /=/ 
 print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
 ' < file.txt

Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.

With sed:

sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt

With perl:

perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

add a commentÂ |Â

up vote
4
down vote

I would try

awk -F| 'BEGIN OFS=" 
 col=index($3,":"); 
 equ=index($3,"=");
 $3=substr($3,equ+1,col-equ-1); 
 print ; ' se

where

-F| tell awk to use | as input separator

equ=index($3,"="); get index of = in third field

$3=substr($3,equ+1,col-equ-1); do actual substitution

answered 2 days ago

Archemar

19.1k93366

add a commentÂ |Â

up vote
0
down vote

The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.

awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file

answered 2 days ago

Claes Wikner

11713

add a commentÂ |Â

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
6
down vote

standard awk is not very good at extracting data out of fields based on patterns. Some options include:

split() to split the text into an array based on specified delimiters.

match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:

awk -F'|' -v OFS='|' '
 split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt

So returns the portion between the first and second occurrence of a = or : in $3.

Or:

awk -F'|' -v OFS='|' '
 match($3, /=[^:]*/) 
 print $1, $2, substr($3, RSTART+1, RLENGTH-1)
 ' < file.txt

GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:

gawk -F'|' -v OFS='|' '
 $3 ~ /=/ 
 print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
 ' < file.txt

With sed:

sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt

With perl:

perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

add a commentÂ |Â

up vote
6
down vote

standard awk is not very good at extracting data out of fields based on patterns. Some options include:

split() to split the text into an array based on specified delimiters.

match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:

awk -F'|' -v OFS='|' '
 split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt

So returns the portion between the first and second occurrence of a = or : in $3.

Or:

awk -F'|' -v OFS='|' '
 match($3, /=[^:]*/) 
 print $1, $2, substr($3, RSTART+1, RLENGTH-1)
 ' < file.txt

GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:

gawk -F'|' -v OFS='|' '
 $3 ~ /=/ 
 print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
 ' < file.txt

With sed:

sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt

With perl:

perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

add a commentÂ |Â

up vote
6
down vote

standard awk is not very good at extracting data out of fields based on patterns. Some options include:

split() to split the text into an array based on specified delimiters.

match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:

awk -F'|' -v OFS='|' '
 split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt

So returns the portion between the first and second occurrence of a = or : in $3.

Or:

awk -F'|' -v OFS='|' '
 match($3, /=[^:]*/) 
 print $1, $2, substr($3, RSTART+1, RLENGTH-1)
 ' < file.txt

GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:

gawk -F'|' -v OFS='|' '
 $3 ~ /=/ 
 print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
 ' < file.txt

With sed:

sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt

With perl:

perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

standard awk is not very good at extracting data out of fields based on patterns. Some options include:

split() to split the text into an array based on specified delimiters.

match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:

awk -F'|' -v OFS='|' '
 split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt

So returns the portion between the first and second occurrence of a = or : in $3.

Or:

awk -F'|' -v OFS='|' '
 match($3, /=[^:]*/) 
 print $1, $2, substr($3, RSTART+1, RLENGTH-1)
 ' < file.txt

GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:

gawk -F'|' -v OFS='|' '
 $3 ~ /=/ 
 print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
 ' < file.txt

With sed:

sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt

With perl:

perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

edited 2 days ago

answered 2 days ago

StÃ©phane Chazelas

283k53522859

answered 2 days ago

StÃ©phane Chazelas

283k53522859

answered 2 days ago

StÃ©phane Chazelas

283k53522859

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

add a commentÂ |Â

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
â€“Â rexkogitans
2 days ago

@rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
â€“Â StÃ©phane Chazelas
2 days ago

The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
â€“Â rexkogitans
2 days ago

@rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
â€“Â StÃ©phane Chazelas
2 days ago

add a commentÂ |Â

up vote
4
down vote

I would try

awk -F| 'BEGIN OFS=" 
 col=index($3,":"); 
 equ=index($3,"=");
 $3=substr($3,equ+1,col-equ-1); 
 print ; ' se

where

-F| tell awk to use | as input separator

equ=index($3,"="); get index of = in third field

$3=substr($3,equ+1,col-equ-1); do actual substitution

answered 2 days ago

Archemar

19.1k93366

add a commentÂ |Â

up vote
4
down vote

I would try

awk -F| 'BEGIN OFS=" 
 col=index($3,":"); 
 equ=index($3,"=");
 $3=substr($3,equ+1,col-equ-1); 
 print ; ' se

where

-F| tell awk to use | as input separator

equ=index($3,"="); get index of = in third field

$3=substr($3,equ+1,col-equ-1); do actual substitution

answered 2 days ago

Archemar

19.1k93366

add a commentÂ |Â

up vote
4
down vote

I would try

awk -F| 'BEGIN OFS=" 
 col=index($3,":"); 
 equ=index($3,"=");
 $3=substr($3,equ+1,col-equ-1); 
 print ; ' se

where

-F| tell awk to use | as input separator

equ=index($3,"="); get index of = in third field

$3=substr($3,equ+1,col-equ-1); do actual substitution

answered 2 days ago

Archemar

19.1k93366

I would try

awk -F| 'BEGIN OFS=" 
 col=index($3,":"); 
 equ=index($3,"=");
 $3=substr($3,equ+1,col-equ-1); 
 print ; ' se

where

-F| tell awk to use | as input separator

equ=index($3,"="); get index of = in third field

$3=substr($3,equ+1,col-equ-1); do actual substitution

answered 2 days ago

Archemar

19.1k93366

answered 2 days ago

Archemar

19.1k93366

answered 2 days ago

Archemar

19.1k93366

answered 2 days ago

Archemar

19.1k93366

add a commentÂ |Â

up vote
0
down vote

The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.

awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file

answered 2 days ago

Claes Wikner

11713

add a commentÂ |Â

up vote
0
down vote

The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.

awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file

answered 2 days ago

Claes Wikner

11713

add a commentÂ |Â

up vote
0
down vote

The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.

awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file

answered 2 days ago

Claes Wikner

11713

The first sub removes the first sixth characters in field 3 and second sub
removes everything after colon including.

awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file

answered 2 days ago

Claes Wikner

11713

answered 2 days ago

Claes Wikner

11713

answered 2 days ago

Claes Wikner

11713

answered 2 days ago

Claes Wikner

11713

add a commentÂ |Â

Search This Blog

Iyfjky