Extracting a specific substring in C
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a string like this:
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
It is sent from an external device and changes every 1 second.
The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:
int findString(char *src, char *dst, int desLen, char *what2find, char termChar)
char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))
if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;
dst[j] = '';
return 1;
So:
int main()
char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";
char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');
finds and return the desired sentence.
But this code has the following problems:
- It is very heuristic. I wonder if there exists some better solutions.
- It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.
Note that the host processor is a 4Mhz ARM MCU!
P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!
P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!
c strings array
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
 |Â
show 3 more comments
up vote
1
down vote
favorite
I have a string like this:
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
It is sent from an external device and changes every 1 second.
The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:
int findString(char *src, char *dst, int desLen, char *what2find, char termChar)
char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))
if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;
dst[j] = '';
return 1;
So:
int main()
char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";
char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');
finds and return the desired sentence.
But this code has the following problems:
- It is very heuristic. I wonder if there exists some better solutions.
- It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.
Note that the host processor is a 4Mhz ARM MCU!
P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!
P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!
c strings array
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1) More examples would help convey your goal. 2) Althoughsrc
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar withconst
andrestrict
?
– chux
2 hours ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago
 |Â
show 3 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a string like this:
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
It is sent from an external device and changes every 1 second.
The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:
int findString(char *src, char *dst, int desLen, char *what2find, char termChar)
char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))
if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;
dst[j] = '';
return 1;
So:
int main()
char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";
char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');
finds and return the desired sentence.
But this code has the following problems:
- It is very heuristic. I wonder if there exists some better solutions.
- It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.
Note that the host processor is a 4Mhz ARM MCU!
P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!
P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!
c strings array
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have a string like this:
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
It is sent from an external device and changes every 1 second.
The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:
int findString(char *src, char *dst, int desLen, char *what2find, char termChar)
char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))
if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;
dst[j] = '';
return 1;
So:
int main()
char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";
char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');
finds and return the desired sentence.
But this code has the following problems:
- It is very heuristic. I wonder if there exists some better solutions.
- It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.
Note that the host processor is a 4Mhz ARM MCU!
P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!
P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!
c strings array
c strings array
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 50 mins ago
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 hours ago
Iman H
62
62
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1) More examples would help convey your goal. 2) Althoughsrc
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar withconst
andrestrict
?
– chux
2 hours ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago
 |Â
show 3 more comments
1) More examples would help convey your goal. 2) Althoughsrc
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar withconst
andrestrict
?
– chux
2 hours ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago
1) More examples would help convey your goal. 2) Although
src
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const
and restrict
?– chux
2 hours ago
1) More examples would help convey your goal. 2) Although
src
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const
and restrict
?– chux
2 hours ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago
 |Â
show 3 more comments
2 Answers
2
active
oldest
votes
up vote
2
down vote
As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.
The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.
What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.
General code review:
Style/best practices
- Pointer parameters to data that isn't modified should be
const
qualified. - Using a plain
int
for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add. The while loop could have been replaced with a for loop, for better readability:
for(size_t i=0; i<desLen-1; i++)
if(temp[i]==termChar)
break;
dst[i]=temp[i];
Performance
- The length of the search key could be determined at compile-time.
- Consider dropping the terminating character parameter if
what2find[0]
could be said to always contain it. - Some micro-optimizations with C99
restrict
are possible. I'll show an example below.
Here is a different version which contains more detailed error handling and checks for termination before copy:
#include <string.h>
#include <stdbool.h>
typedef enum
STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;
strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])
char* start = strstr(str, key);
if(start == NULL)
return STRFIND_KEY_NOT_FOUND;
start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)
return STRFIND_TERMINATOR_NOT_FOUND;
size_t length = (size_t)(end-start);
if(length+1 > dst_size)
return STRFIND_BUFFER_OVERFLOW;
memcpy(dst, start, length);
dst[length] = '';
return STRFIND_OK;
C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr
could be replaced with a for loop like the one demonstrated above.
memcpy
is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.
Further micro-optimization is possible with C99 restrict:
strfind_result_t strfind_cpy (const char* restrict str,
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);
This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict
gave any benefits.
Example of use:
#include <stdio.h>
int main (void)
const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;
result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);
if(result == STRFIND_OK)
puts(buf);
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not usestrncpy
for any purpose, it is both slow and dangerous.memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
add a comment |Â
up vote
1
down vote
While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT
and '$'
given that you have the "rn"
prior to the terminating '$'
. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-"
extracted from the string (drop a comment if this is incorrect).
If this is the case, you can use the rn
to your advantage in tokenizing the string to return that part between $GPTXT
and "rn$"
by using "rn$"
as the delimiters passed to strtok
. The only additional task needed in your findstr
function would be to step past your search string before calling strtok
.
In looking at the declaration for findstring
, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest
comes before src
as in strcpy, etc...
I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.
I would also change your termChar
to a const char*
parameter to allow flexibility in passing the delimiter to use with strtok
.
An implementation might look like:
#include <stdio.h>
#include <string.h>
#define TERM 32
#define MAXC 1024
#define DELIM "rn$"
char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)
char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */
if (!p) /* validate or return NULL */
return NULL;
p += strlen (srch); /* step past search str */
/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */
return dest;
int main (void)
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";
if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);
(note: since you have already scanned to length of your token to validate it is less than len
, there is no need to scan again by using strcpy
to effect the token copy to dest
, simply using memcpy
will provide an ever-so-slightly more efficient copy)
Example Use/Output
While provides:
$ ./bin/findstr
sub: '->SOME CODES HERE<-'
If you are looking to parse something slightly different, let me know.
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered withstrtok()
regardless of whether"rn"
is sent, so long as you include'$'
as part of the delimiter as well.strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by toint
without any loss of functionality and sincedest
is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago
 |Â
show 1 more comment
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.
The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.
What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.
General code review:
Style/best practices
- Pointer parameters to data that isn't modified should be
const
qualified. - Using a plain
int
for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add. The while loop could have been replaced with a for loop, for better readability:
for(size_t i=0; i<desLen-1; i++)
if(temp[i]==termChar)
break;
dst[i]=temp[i];
Performance
- The length of the search key could be determined at compile-time.
- Consider dropping the terminating character parameter if
what2find[0]
could be said to always contain it. - Some micro-optimizations with C99
restrict
are possible. I'll show an example below.
Here is a different version which contains more detailed error handling and checks for termination before copy:
#include <string.h>
#include <stdbool.h>
typedef enum
STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;
strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])
char* start = strstr(str, key);
if(start == NULL)
return STRFIND_KEY_NOT_FOUND;
start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)
return STRFIND_TERMINATOR_NOT_FOUND;
size_t length = (size_t)(end-start);
if(length+1 > dst_size)
return STRFIND_BUFFER_OVERFLOW;
memcpy(dst, start, length);
dst[length] = '';
return STRFIND_OK;
C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr
could be replaced with a for loop like the one demonstrated above.
memcpy
is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.
Further micro-optimization is possible with C99 restrict:
strfind_result_t strfind_cpy (const char* restrict str,
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);
This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict
gave any benefits.
Example of use:
#include <stdio.h>
int main (void)
const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;
result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);
if(result == STRFIND_OK)
puts(buf);
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not usestrncpy
for any purpose, it is both slow and dangerous.memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
add a comment |Â
up vote
2
down vote
As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.
The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.
What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.
General code review:
Style/best practices
- Pointer parameters to data that isn't modified should be
const
qualified. - Using a plain
int
for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add. The while loop could have been replaced with a for loop, for better readability:
for(size_t i=0; i<desLen-1; i++)
if(temp[i]==termChar)
break;
dst[i]=temp[i];
Performance
- The length of the search key could be determined at compile-time.
- Consider dropping the terminating character parameter if
what2find[0]
could be said to always contain it. - Some micro-optimizations with C99
restrict
are possible. I'll show an example below.
Here is a different version which contains more detailed error handling and checks for termination before copy:
#include <string.h>
#include <stdbool.h>
typedef enum
STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;
strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])
char* start = strstr(str, key);
if(start == NULL)
return STRFIND_KEY_NOT_FOUND;
start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)
return STRFIND_TERMINATOR_NOT_FOUND;
size_t length = (size_t)(end-start);
if(length+1 > dst_size)
return STRFIND_BUFFER_OVERFLOW;
memcpy(dst, start, length);
dst[length] = '';
return STRFIND_OK;
C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr
could be replaced with a for loop like the one demonstrated above.
memcpy
is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.
Further micro-optimization is possible with C99 restrict:
strfind_result_t strfind_cpy (const char* restrict str,
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);
This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict
gave any benefits.
Example of use:
#include <stdio.h>
int main (void)
const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;
result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);
if(result == STRFIND_OK)
puts(buf);
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not usestrncpy
for any purpose, it is both slow and dangerous.memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.
The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.
What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.
General code review:
Style/best practices
- Pointer parameters to data that isn't modified should be
const
qualified. - Using a plain
int
for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add. The while loop could have been replaced with a for loop, for better readability:
for(size_t i=0; i<desLen-1; i++)
if(temp[i]==termChar)
break;
dst[i]=temp[i];
Performance
- The length of the search key could be determined at compile-time.
- Consider dropping the terminating character parameter if
what2find[0]
could be said to always contain it. - Some micro-optimizations with C99
restrict
are possible. I'll show an example below.
Here is a different version which contains more detailed error handling and checks for termination before copy:
#include <string.h>
#include <stdbool.h>
typedef enum
STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;
strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])
char* start = strstr(str, key);
if(start == NULL)
return STRFIND_KEY_NOT_FOUND;
start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)
return STRFIND_TERMINATOR_NOT_FOUND;
size_t length = (size_t)(end-start);
if(length+1 > dst_size)
return STRFIND_BUFFER_OVERFLOW;
memcpy(dst, start, length);
dst[length] = '';
return STRFIND_OK;
C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr
could be replaced with a for loop like the one demonstrated above.
memcpy
is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.
Further micro-optimization is possible with C99 restrict:
strfind_result_t strfind_cpy (const char* restrict str,
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);
This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict
gave any benefits.
Example of use:
#include <stdio.h>
int main (void)
const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;
result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);
if(result == STRFIND_OK)
puts(buf);
As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.
The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.
What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.
General code review:
Style/best practices
- Pointer parameters to data that isn't modified should be
const
qualified. - Using a plain
int
for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add. The while loop could have been replaced with a for loop, for better readability:
for(size_t i=0; i<desLen-1; i++)
if(temp[i]==termChar)
break;
dst[i]=temp[i];
Performance
- The length of the search key could be determined at compile-time.
- Consider dropping the terminating character parameter if
what2find[0]
could be said to always contain it. - Some micro-optimizations with C99
restrict
are possible. I'll show an example below.
Here is a different version which contains more detailed error handling and checks for termination before copy:
#include <string.h>
#include <stdbool.h>
typedef enum
STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;
strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])
char* start = strstr(str, key);
if(start == NULL)
return STRFIND_KEY_NOT_FOUND;
start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)
return STRFIND_TERMINATOR_NOT_FOUND;
size_t length = (size_t)(end-start);
if(length+1 > dst_size)
return STRFIND_BUFFER_OVERFLOW;
memcpy(dst, start, length);
dst[length] = '';
return STRFIND_OK;
C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr
could be replaced with a for loop like the one demonstrated above.
memcpy
is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.
Further micro-optimization is possible with C99 restrict:
strfind_result_t strfind_cpy (const char* restrict str,
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);
This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict
gave any benefits.
Example of use:
#include <stdio.h>
int main (void)
const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;
result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);
if(result == STRFIND_OK)
puts(buf);
edited 1 hour ago
answered 1 hour ago
Lundin
1,384621
1,384621
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not usestrncpy
for any purpose, it is both slow and dangerous.memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
add a comment |Â
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not usestrncpy
for any purpose, it is both slow and dangerous.memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use
strncpy
for any purpose, it is both slow and dangerous. memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?– Lundin
1 hour ago
@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use
strncpy
for any purpose, it is both slow and dangerous. memcpy
is superior to it in every single way. See Which functions from the standard library must (should) be avoided?– Lundin
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago
add a comment |Â
up vote
1
down vote
While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT
and '$'
given that you have the "rn"
prior to the terminating '$'
. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-"
extracted from the string (drop a comment if this is incorrect).
If this is the case, you can use the rn
to your advantage in tokenizing the string to return that part between $GPTXT
and "rn$"
by using "rn$"
as the delimiters passed to strtok
. The only additional task needed in your findstr
function would be to step past your search string before calling strtok
.
In looking at the declaration for findstring
, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest
comes before src
as in strcpy, etc...
I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.
I would also change your termChar
to a const char*
parameter to allow flexibility in passing the delimiter to use with strtok
.
An implementation might look like:
#include <stdio.h>
#include <string.h>
#define TERM 32
#define MAXC 1024
#define DELIM "rn$"
char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)
char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */
if (!p) /* validate or return NULL */
return NULL;
p += strlen (srch); /* step past search str */
/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */
return dest;
int main (void)
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";
if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);
(note: since you have already scanned to length of your token to validate it is less than len
, there is no need to scan again by using strcpy
to effect the token copy to dest
, simply using memcpy
will provide an ever-so-slightly more efficient copy)
Example Use/Output
While provides:
$ ./bin/findstr
sub: '->SOME CODES HERE<-'
If you are looking to parse something slightly different, let me know.
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered withstrtok()
regardless of whether"rn"
is sent, so long as you include'$'
as part of the delimiter as well.strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by toint
without any loss of functionality and sincedest
is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago
 |Â
show 1 more comment
up vote
1
down vote
While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT
and '$'
given that you have the "rn"
prior to the terminating '$'
. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-"
extracted from the string (drop a comment if this is incorrect).
If this is the case, you can use the rn
to your advantage in tokenizing the string to return that part between $GPTXT
and "rn$"
by using "rn$"
as the delimiters passed to strtok
. The only additional task needed in your findstr
function would be to step past your search string before calling strtok
.
In looking at the declaration for findstring
, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest
comes before src
as in strcpy, etc...
I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.
I would also change your termChar
to a const char*
parameter to allow flexibility in passing the delimiter to use with strtok
.
An implementation might look like:
#include <stdio.h>
#include <string.h>
#define TERM 32
#define MAXC 1024
#define DELIM "rn$"
char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)
char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */
if (!p) /* validate or return NULL */
return NULL;
p += strlen (srch); /* step past search str */
/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */
return dest;
int main (void)
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";
if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);
(note: since you have already scanned to length of your token to validate it is less than len
, there is no need to scan again by using strcpy
to effect the token copy to dest
, simply using memcpy
will provide an ever-so-slightly more efficient copy)
Example Use/Output
While provides:
$ ./bin/findstr
sub: '->SOME CODES HERE<-'
If you are looking to parse something slightly different, let me know.
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered withstrtok()
regardless of whether"rn"
is sent, so long as you include'$'
as part of the delimiter as well.strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by toint
without any loss of functionality and sincedest
is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago
 |Â
show 1 more comment
up vote
1
down vote
up vote
1
down vote
While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT
and '$'
given that you have the "rn"
prior to the terminating '$'
. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-"
extracted from the string (drop a comment if this is incorrect).
If this is the case, you can use the rn
to your advantage in tokenizing the string to return that part between $GPTXT
and "rn$"
by using "rn$"
as the delimiters passed to strtok
. The only additional task needed in your findstr
function would be to step past your search string before calling strtok
.
In looking at the declaration for findstring
, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest
comes before src
as in strcpy, etc...
I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.
I would also change your termChar
to a const char*
parameter to allow flexibility in passing the delimiter to use with strtok
.
An implementation might look like:
#include <stdio.h>
#include <string.h>
#define TERM 32
#define MAXC 1024
#define DELIM "rn$"
char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)
char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */
if (!p) /* validate or return NULL */
return NULL;
p += strlen (srch); /* step past search str */
/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */
return dest;
int main (void)
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";
if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);
(note: since you have already scanned to length of your token to validate it is less than len
, there is no need to scan again by using strcpy
to effect the token copy to dest
, simply using memcpy
will provide an ever-so-slightly more efficient copy)
Example Use/Output
While provides:
$ ./bin/findstr
sub: '->SOME CODES HERE<-'
If you are looking to parse something slightly different, let me know.
While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT
and '$'
given that you have the "rn"
prior to the terminating '$'
. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-"
extracted from the string (drop a comment if this is incorrect).
If this is the case, you can use the rn
to your advantage in tokenizing the string to return that part between $GPTXT
and "rn$"
by using "rn$"
as the delimiters passed to strtok
. The only additional task needed in your findstr
function would be to step past your search string before calling strtok
.
In looking at the declaration for findstring
, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest
comes before src
as in strcpy, etc...
I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.
I would also change your termChar
to a const char*
parameter to allow flexibility in passing the delimiter to use with strtok
.
An implementation might look like:
#include <stdio.h>
#include <string.h>
#define TERM 32
#define MAXC 1024
#define DELIM "rn$"
char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)
char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */
if (!p) /* validate or return NULL */
return NULL;
p += strlen (srch); /* step past search str */
/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */
return dest;
int main (void)
char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";
if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);
(note: since you have already scanned to length of your token to validate it is less than len
, there is no need to scan again by using strcpy
to effect the token copy to dest
, simply using memcpy
will provide an ever-so-slightly more efficient copy)
Example Use/Output
While provides:
$ ./bin/findstr
sub: '->SOME CODES HERE<-'
If you are looking to parse something slightly different, let me know.
answered 1 hour ago


David C. Rankin
19617
19617
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered withstrtok()
regardless of whether"rn"
is sent, so long as you include'$'
as part of the delimiter as well.strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by toint
without any loss of functionality and sincedest
is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago
 |Â
show 1 more comment
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered withstrtok()
regardless of whether"rn"
is sent, so long as you include'$'
as part of the delimiter as well.strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by toint
without any loss of functionality and sincedest
is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago
You are still covered with
strtok()
regardless of whether "rn"
is sent, so long as you include '$'
as part of the delimiter as well. strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int
without any loss of functionality and since dest
is filled within the function there is no need for a return.– David C. Rankin
1 hour ago
You are still covered with
strtok()
regardless of whether "rn"
is sent, so long as you include '$'
as part of the delimiter as well. strtok()
will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int
without any loss of functionality and since dest
is filled within the function there is no need for a return.– David C. Rankin
1 hour ago
 |Â
show 1 more comment
Iman H is a new contributor. Be nice, and check out our Code of Conduct.
Iman H is a new contributor. Be nice, and check out our Code of Conduct.
Iman H is a new contributor. Be nice, and check out our Code of Conduct.
Iman H is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f206026%2fextracting-a-specific-substring-in-c%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1) More examples would help convey your goal. 2) Although
src
is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar withconst
andrestrict
?– chux
2 hours ago
I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago
@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago
@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago
@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago