Find how many times a certain DNA base sequence occurs in a file
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
The assignment is to write a bash script named âÂÂcountmatchesâ that will display the number of times a certain sequence, such as aac
, appears in a specified file. The script should expect at least two arguments in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining argument(s) are strings containing only the bases a
, c
, g
, and t
in any order.ÃÂ
For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string (i.e.,ÃÂ theÃÂ file).
An example sequence and output would be if the string aaccgtttgtaaccggaac
is in a file named dnafile
, then the script should work as follows
$ countmatches dnafile ttt
ttt 1
with the command being countmatches dnafile ttt
and the output being tttÃÂ 1
, showing that ttt
appears once.
This is my script:
#!/bin/bash
for /data/biocs/b/student.accounts/cs132/data/dna_textfiles
do
count=$grep -o '[acgt][acgt][acgt]' /data/biocs/b/student.accounts/cs132/data/dna_textfiles | wc -w
echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count
done
and this is the error I get
[Osama.Chaudry07@cslab5 assignment3]$ ./countmatches /data/biocs/b/student.accounts/cs132/data/dna_textfiles aac
./countmatches: line 6: '/data/biocs/b/student.accounts/cs132/data/dna_textfiles': not a valid identifier
text-processing scripting bioinformatics
New contributor
 |Â
show 2 more comments
up vote
1
down vote
favorite
The assignment is to write a bash script named âÂÂcountmatchesâ that will display the number of times a certain sequence, such as aac
, appears in a specified file. The script should expect at least two arguments in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining argument(s) are strings containing only the bases a
, c
, g
, and t
in any order.ÃÂ
For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string (i.e.,ÃÂ theÃÂ file).
An example sequence and output would be if the string aaccgtttgtaaccggaac
is in a file named dnafile
, then the script should work as follows
$ countmatches dnafile ttt
ttt 1
with the command being countmatches dnafile ttt
and the output being tttÃÂ 1
, showing that ttt
appears once.
This is my script:
#!/bin/bash
for /data/biocs/b/student.accounts/cs132/data/dna_textfiles
do
count=$grep -o '[acgt][acgt][acgt]' /data/biocs/b/student.accounts/cs132/data/dna_textfiles | wc -w
echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count
done
and this is the error I get
[Osama.Chaudry07@cslab5 assignment3]$ ./countmatches /data/biocs/b/student.accounts/cs132/data/dna_textfiles aac
./countmatches: line 6: '/data/biocs/b/student.accounts/cs132/data/dna_textfiles': not a valid identifier
text-processing scripting bioinformatics
New contributor
2
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago
 |Â
show 2 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
The assignment is to write a bash script named âÂÂcountmatchesâ that will display the number of times a certain sequence, such as aac
, appears in a specified file. The script should expect at least two arguments in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining argument(s) are strings containing only the bases a
, c
, g
, and t
in any order.ÃÂ
For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string (i.e.,ÃÂ theÃÂ file).
An example sequence and output would be if the string aaccgtttgtaaccggaac
is in a file named dnafile
, then the script should work as follows
$ countmatches dnafile ttt
ttt 1
with the command being countmatches dnafile ttt
and the output being tttÃÂ 1
, showing that ttt
appears once.
This is my script:
#!/bin/bash
for /data/biocs/b/student.accounts/cs132/data/dna_textfiles
do
count=$grep -o '[acgt][acgt][acgt]' /data/biocs/b/student.accounts/cs132/data/dna_textfiles | wc -w
echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count
done
and this is the error I get
[Osama.Chaudry07@cslab5 assignment3]$ ./countmatches /data/biocs/b/student.accounts/cs132/data/dna_textfiles aac
./countmatches: line 6: '/data/biocs/b/student.accounts/cs132/data/dna_textfiles': not a valid identifier
text-processing scripting bioinformatics
New contributor
The assignment is to write a bash script named âÂÂcountmatchesâ that will display the number of times a certain sequence, such as aac
, appears in a specified file. The script should expect at least two arguments in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining argument(s) are strings containing only the bases a
, c
, g
, and t
in any order.ÃÂ
For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string (i.e.,ÃÂ theÃÂ file).
An example sequence and output would be if the string aaccgtttgtaaccggaac
is in a file named dnafile
, then the script should work as follows
$ countmatches dnafile ttt
ttt 1
with the command being countmatches dnafile ttt
and the output being tttÃÂ 1
, showing that ttt
appears once.
This is my script:
#!/bin/bash
for /data/biocs/b/student.accounts/cs132/data/dna_textfiles
do
count=$grep -o '[acgt][acgt][acgt]' /data/biocs/b/student.accounts/cs132/data/dna_textfiles | wc -w
echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count
done
and this is the error I get
[Osama.Chaudry07@cslab5 assignment3]$ ./countmatches /data/biocs/b/student.accounts/cs132/data/dna_textfiles aac
./countmatches: line 6: '/data/biocs/b/student.accounts/cs132/data/dna_textfiles': not a valid identifier
text-processing scripting bioinformatics
text-processing scripting bioinformatics
New contributor
New contributor
edited 37 mins ago
G-Man
12k92759
12k92759
New contributor
asked 4 hours ago
Chaudry Osama
203
203
New contributor
New contributor
2
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago
 |Â
show 2 more comments
2
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago
2
2
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago
 |Â
show 2 more comments
1 Answer
1
active
oldest
votes
up vote
4
down vote
cat dna_textfile
aaccgtttgtaaccggaac
#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done
count=$(grep -o "$userInput" $dna_file | wc -l)
echo "$userInput", $count
output:
ttt, 1
#!/bin/bash
#set first and second arguments (dnafile and base respectively)
dir=$1
base=$2
count=$(grep -o $base $dir | wc -l)
echo "$base", $count
output:
$ ./countmatches dnafile ttt
ttt, 1
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
cat dna_textfile
aaccgtttgtaaccggaac
#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done
count=$(grep -o "$userInput" $dna_file | wc -l)
echo "$userInput", $count
output:
ttt, 1
#!/bin/bash
#set first and second arguments (dnafile and base respectively)
dir=$1
base=$2
count=$(grep -o $base $dir | wc -l)
echo "$base", $count
output:
$ ./countmatches dnafile ttt
ttt, 1
add a comment |Â
up vote
4
down vote
cat dna_textfile
aaccgtttgtaaccggaac
#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done
count=$(grep -o "$userInput" $dna_file | wc -l)
echo "$userInput", $count
output:
ttt, 1
#!/bin/bash
#set first and second arguments (dnafile and base respectively)
dir=$1
base=$2
count=$(grep -o $base $dir | wc -l)
echo "$base", $count
output:
$ ./countmatches dnafile ttt
ttt, 1
add a comment |Â
up vote
4
down vote
up vote
4
down vote
cat dna_textfile
aaccgtttgtaaccggaac
#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done
count=$(grep -o "$userInput" $dna_file | wc -l)
echo "$userInput", $count
output:
ttt, 1
#!/bin/bash
#set first and second arguments (dnafile and base respectively)
dir=$1
base=$2
count=$(grep -o $base $dir | wc -l)
echo "$base", $count
output:
$ ./countmatches dnafile ttt
ttt, 1
cat dna_textfile
aaccgtttgtaaccggaac
#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done
count=$(grep -o "$userInput" $dna_file | wc -l)
echo "$userInput", $count
output:
ttt, 1
#!/bin/bash
#set first and second arguments (dnafile and base respectively)
dir=$1
base=$2
count=$(grep -o $base $dir | wc -l)
echo "$base", $count
output:
$ ./countmatches dnafile ttt
ttt, 1
edited 30 mins ago
answered 2 hours ago
Goro
9,19464486
9,19464486
add a comment |Â
add a comment |Â
Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.
Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.
Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.
Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475426%2ffind-how-many-times-a-certain-dna-base-sequence-occurs-in-a-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
â roaima
4 hours ago
Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
â roaima
3 hours ago
dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
â Chaudry Osama
3 hours ago
@Goro yes that is what the goal is, to enter any sequence that is present in the large dna_textfiles and have the output be how many times that base appears. I wrote a script for it but it isn't achieving what I want it to and I don't understand where I went wrong.
â Chaudry Osama
2 hours ago
@Goro all of the repeats in a sequence. For example in the sequence aaccgtttgtaaccggaac, if I were to input the base aac, it would show that it comes up 3 times.
â Chaudry Osama
2 hours ago