Extracting a specific substring in C

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have a string like this:



char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";


It is sent from an external device and changes every 1 second.



The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:



int findString(char *src, char *dst, int desLen, char *what2find, char termChar)

char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))

if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;

dst[j] = '';
return 1;



So:



 int main()

char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";

char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');



finds and return the desired sentence.



But this code has the following problems:



  1. It is very heuristic. I wonder if there exists some better solutions.

  2. It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.

Note that the host processor is a 4Mhz ARM MCU!



P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!



P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!










share|improve this question









New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • 1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
    – chux
    2 hours ago











  • I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
    – Lundin
    1 hour ago










  • @Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
    – Mast
    1 hour ago










  • @chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
    – Iman H
    1 hour ago











  • @Lundin Data is raw. We receive it over serial bus.
    – Iman H
    1 hour ago














up vote
1
down vote

favorite












I have a string like this:



char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";


It is sent from an external device and changes every 1 second.



The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:



int findString(char *src, char *dst, int desLen, char *what2find, char termChar)

char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))

if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;

dst[j] = '';
return 1;



So:



 int main()

char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";

char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');



finds and return the desired sentence.



But this code has the following problems:



  1. It is very heuristic. I wonder if there exists some better solutions.

  2. It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.

Note that the host processor is a 4Mhz ARM MCU!



P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!



P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!










share|improve this question









New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • 1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
    – chux
    2 hours ago











  • I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
    – Lundin
    1 hour ago










  • @Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
    – Mast
    1 hour ago










  • @chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
    – Iman H
    1 hour ago











  • @Lundin Data is raw. We receive it over serial bus.
    – Iman H
    1 hour ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a string like this:



char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";


It is sent from an external device and changes every 1 second.



The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:



int findString(char *src, char *dst, int desLen, char *what2find, char termChar)

char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))

if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;

dst[j] = '';
return 1;



So:



 int main()

char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";

char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');



finds and return the desired sentence.



But this code has the following problems:



  1. It is very heuristic. I wonder if there exists some better solutions.

  2. It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.

Note that the host processor is a 4Mhz ARM MCU!



P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!



P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!










share|improve this question









New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a string like this:



char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";


It is sent from an external device and changes every 1 second.



The string length is over 1000 bytes and consisted of some standard sentences beginning with "$GPXXX" and ending with CRLF. I've written the following function to extract a specific sentence:



int findString(char *src, char *dst, int desLen, char *what2find, char termChar)

char *temp;
temp = strstr(src, what2find); ;
if (temp == NULL)
return 0;
else
temp += strlen(what2find);
int j = 0;
while (j<(desLen-1))

if (temp[j]==termChar)
break;
dst[j]= temp[j];
j++;

dst[j] = '';
return 1;



So:



 int main()

char buffer =
"$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47rn
$GPGSA,A,3,04,05,,09,12,,,24,,,,,2.5,1.3,2.1*39rn
$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75rn
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6Arn
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48rn";

char dst[50];
findString(buffer,dst,50,"$GPTXT",'$');



finds and return the desired sentence.



But this code has the following problems:



  1. It is very heuristic. I wonder if there exists some better solutions.

  2. It depends on a character (i.e. '$') for termination. It may terminate after finding a '$' but not "$GPXXX". The optimal solution may find the characters between two strings, e.g. "$GPTXT" and "$GPRMC". I don't know if it is optimally achievable.

Note that the host processor is a 4Mhz ARM MCU!



P.S: The above-mentioned function works fine in my project. I just want to widen my C programming knowledge!



P.S. 2: Both answers from Lundin and David C. Rankin are nice. Unfortunately I can not accept both of them as answer!







c strings array






share|improve this question









New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 50 mins ago





















New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 hours ago









Iman H

62




62




New contributor




Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Iman H is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • 1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
    – chux
    2 hours ago











  • I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
    – Lundin
    1 hour ago










  • @Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
    – Mast
    1 hour ago










  • @chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
    – Iman H
    1 hour ago











  • @Lundin Data is raw. We receive it over serial bus.
    – Iman H
    1 hour ago
















  • 1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
    – chux
    2 hours ago











  • I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
    – Lundin
    1 hour ago










  • @Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
    – Mast
    1 hour ago










  • @chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
    – Iman H
    1 hour ago











  • @Lundin Data is raw. We receive it over serial bus.
    – Iman H
    1 hour ago















1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
– chux
2 hours ago





1) More examples would help convey your goal. 2) Although src is 1000s long, provide details about pre-"blablabla", "SOME CODES HERE<", and post-"blablabla" lengths. Do they each vary in length from 0 to 1000s? 3) Are you familiar with const and restrict?
– chux
2 hours ago













I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago




I think the key to writing the optimal function here is knowing how reliable the data is. Have you previously verified the input string, or is it just a raw buffer received from a serial bus?
– Lundin
1 hour ago












@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago




@Lundin Data verification in such situations is often done after the fact. Either the result is useful, or it isn't, but a result you will get.
– Mast
1 hour ago












@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago





@chux For example the src length is 1000. It is consisted of 10 sentences with 100 bytes length. I mean the src length is the sum of sentences' lengths. But everything may vary with time. For example one sentence may be 110 characters and another may become 90. The total is always the src length.
– Iman H
1 hour ago













@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago




@Lundin Data is raw. We receive it over serial bus.
– Iman H
1 hour ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote













As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.



The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.



What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.




General code review:



Style/best practices



  • Pointer parameters to data that isn't modified should be const qualified.

  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.


  • The while loop could have been replaced with a for loop, for better readability:



    for(size_t i=0; i<desLen-1; i++)

    if(temp[i]==termChar)

    break;

    dst[i]=temp[i];



Performance



  • The length of the search key could be determined at compile-time.

  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.

  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.


Here is a different version which contains more detailed error handling and checks for termination before copy:



#include <string.h>
#include <stdbool.h>

typedef enum

STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;

strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])

char* start = strstr(str, key);
if(start == NULL)

return STRFIND_KEY_NOT_FOUND;


start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)

return STRFIND_TERMINATOR_NOT_FOUND;


size_t length = (size_t)(end-start);
if(length+1 > dst_size)

return STRFIND_BUFFER_OVERFLOW;


memcpy(dst, start, length);
dst[length] = '';

return STRFIND_OK;



C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.



memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.



Further micro-optimization is possible with C99 restrict:



strfind_result_t strfind_cpy (const char* restrict str, 
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);


This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.




Example of use:



#include <stdio.h>

int main (void)

const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;

result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);

if(result == STRFIND_OK)

puts(buf);







share|improve this answer






















  • Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
    – Iman H
    1 hour ago











  • @ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
    – Lundin
    1 hour ago











  • So I can solve the second problem I mentioned above, in this way.
    – Iman H
    1 hour ago










  • @ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
    – Lundin
    1 hour ago

















up vote
1
down vote













While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT and '$' given that you have the "rn" prior to the terminating '$'. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-" extracted from the string (drop a comment if this is incorrect).



If this is the case, you can use the rn to your advantage in tokenizing the string to return that part between $GPTXT and "rn$" by using "rn$" as the delimiters passed to strtok. The only additional task needed in your findstr function would be to step past your search string before calling strtok.



In looking at the declaration for findstring, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest comes before src as in strcpy, etc... I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.



I would also change your termChar to a const char* parameter to allow flexibility in passing the delimiter to use with strtok.



An implementation might look like:



#include <stdio.h>
#include <string.h>

#define TERM 32
#define MAXC 1024
#define DELIM "rn$"

char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)

char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */

if (!p) /* validate or return NULL */
return NULL;

p += strlen (srch); /* step past search str */

/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */

return dest;


int main (void)

char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";

if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);



(note: since you have already scanned to length of your token to validate it is less than len, there is no need to scan again by using strcpy to effect the token copy to dest, simply using memcpy will provide an ever-so-slightly more efficient copy)



Example Use/Output



While provides:



$ ./bin/findstr
sub: '->SOME CODES HERE<-'


If you are looking to parse something slightly different, let me know.






share|improve this answer




















  • My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
    – Lundin
    1 hour ago











  • That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
    – David C. Rankin
    1 hour ago










  • In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
    – Iman H
    1 hour ago










  • And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
    – Iman H
    1 hour ago











  • You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
    – David C. Rankin
    1 hour ago











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Iman H is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f206026%2fextracting-a-specific-substring-in-c%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote













As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.



The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.



What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.




General code review:



Style/best practices



  • Pointer parameters to data that isn't modified should be const qualified.

  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.


  • The while loop could have been replaced with a for loop, for better readability:



    for(size_t i=0; i<desLen-1; i++)

    if(temp[i]==termChar)

    break;

    dst[i]=temp[i];



Performance



  • The length of the search key could be determined at compile-time.

  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.

  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.


Here is a different version which contains more detailed error handling and checks for termination before copy:



#include <string.h>
#include <stdbool.h>

typedef enum

STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;

strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])

char* start = strstr(str, key);
if(start == NULL)

return STRFIND_KEY_NOT_FOUND;


start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)

return STRFIND_TERMINATOR_NOT_FOUND;


size_t length = (size_t)(end-start);
if(length+1 > dst_size)

return STRFIND_BUFFER_OVERFLOW;


memcpy(dst, start, length);
dst[length] = '';

return STRFIND_OK;



C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.



memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.



Further micro-optimization is possible with C99 restrict:



strfind_result_t strfind_cpy (const char* restrict str, 
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);


This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.




Example of use:



#include <stdio.h>

int main (void)

const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;

result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);

if(result == STRFIND_OK)

puts(buf);







share|improve this answer






















  • Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
    – Iman H
    1 hour ago











  • @ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
    – Lundin
    1 hour ago











  • So I can solve the second problem I mentioned above, in this way.
    – Iman H
    1 hour ago










  • @ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
    – Lundin
    1 hour ago














up vote
2
down vote













As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.



The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.



What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.




General code review:



Style/best practices



  • Pointer parameters to data that isn't modified should be const qualified.

  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.


  • The while loop could have been replaced with a for loop, for better readability:



    for(size_t i=0; i<desLen-1; i++)

    if(temp[i]==termChar)

    break;

    dst[i]=temp[i];



Performance



  • The length of the search key could be determined at compile-time.

  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.

  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.


Here is a different version which contains more detailed error handling and checks for termination before copy:



#include <string.h>
#include <stdbool.h>

typedef enum

STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;

strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])

char* start = strstr(str, key);
if(start == NULL)

return STRFIND_KEY_NOT_FOUND;


start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)

return STRFIND_TERMINATOR_NOT_FOUND;


size_t length = (size_t)(end-start);
if(length+1 > dst_size)

return STRFIND_BUFFER_OVERFLOW;


memcpy(dst, start, length);
dst[length] = '';

return STRFIND_OK;



C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.



memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.



Further micro-optimization is possible with C99 restrict:



strfind_result_t strfind_cpy (const char* restrict str, 
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);


This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.




Example of use:



#include <stdio.h>

int main (void)

const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;

result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);

if(result == STRFIND_OK)

puts(buf);







share|improve this answer






















  • Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
    – Iman H
    1 hour ago











  • @ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
    – Lundin
    1 hour ago











  • So I can solve the second problem I mentioned above, in this way.
    – Iman H
    1 hour ago










  • @ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
    – Lundin
    1 hour ago












up vote
2
down vote










up vote
2
down vote









As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.



The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.



What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.




General code review:



Style/best practices



  • Pointer parameters to data that isn't modified should be const qualified.

  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.


  • The while loop could have been replaced with a for loop, for better readability:



    for(size_t i=0; i<desLen-1; i++)

    if(temp[i]==termChar)

    break;

    dst[i]=temp[i];



Performance



  • The length of the search key could be determined at compile-time.

  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.

  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.


Here is a different version which contains more detailed error handling and checks for termination before copy:



#include <string.h>
#include <stdbool.h>

typedef enum

STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;

strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])

char* start = strstr(str, key);
if(start == NULL)

return STRFIND_KEY_NOT_FOUND;


start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)

return STRFIND_TERMINATOR_NOT_FOUND;


size_t length = (size_t)(end-start);
if(length+1 > dst_size)

return STRFIND_BUFFER_OVERFLOW;


memcpy(dst, start, length);
dst[length] = '';

return STRFIND_OK;



C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.



memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.



Further micro-optimization is possible with C99 restrict:



strfind_result_t strfind_cpy (const char* restrict str, 
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);


This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.




Example of use:



#include <stdio.h>

int main (void)

const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;

result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);

if(result == STRFIND_OK)

puts(buf);







share|improve this answer














As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.



The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.



What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.




General code review:



Style/best practices



  • Pointer parameters to data that isn't modified should be const qualified.

  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.


  • The while loop could have been replaced with a for loop, for better readability:



    for(size_t i=0; i<desLen-1; i++)

    if(temp[i]==termChar)

    break;

    dst[i]=temp[i];



Performance



  • The length of the search key could be determined at compile-time.

  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.

  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.


Here is a different version which contains more detailed error handling and checks for termination before copy:



#include <string.h>
#include <stdbool.h>

typedef enum

STRFIND_OK,
STRFIND_KEY_NOT_FOUND,
STRFIND_TERMINATOR_NOT_FOUND,
STRFIND_BUFFER_OVERFLOW,
strfind_result_t;

strfind_result_t strfind_cpy (const char* str,
size_t key_size,
const char key[key_size],
char terminator,
size_t dst_size,
char dst [dst_size])

char* start = strstr(str, key);
if(start == NULL)

return STRFIND_KEY_NOT_FOUND;


start += key_size-1;
char* end = strchr(start, terminator);
if(end == NULL)

return STRFIND_TERMINATOR_NOT_FOUND;


size_t length = (size_t)(end-start);
if(length+1 > dst_size)

return STRFIND_BUFFER_OVERFLOW;


memcpy(dst, start, length);
dst[length] = '';

return STRFIND_OK;



C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.



memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.



Further micro-optimization is possible with C99 restrict:



strfind_result_t strfind_cpy (const char* restrict str, 
size_t key_size,
const char* restrict key,
char terminator,
size_t dst_size,
char* restrict dst);


This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.




Example of use:



#include <stdio.h>

int main (void)

const char data = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char buf[50];
strfind_result_t result;

result = strfind_cpy(data,
sizeof "$GPTXT",
"$GPTXT",
'$',
sizeof buf,
buf);

if(result == STRFIND_OK)

puts(buf);








share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 1 hour ago









Lundin

1,384621




1,384621











  • Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
    – Iman H
    1 hour ago











  • @ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
    – Lundin
    1 hour ago











  • So I can solve the second problem I mentioned above, in this way.
    – Iman H
    1 hour ago










  • @ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
    – Lundin
    1 hour ago
















  • Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
    – Iman H
    1 hour ago











  • @ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
    – Lundin
    1 hour ago











  • So I can solve the second problem I mentioned above, in this way.
    – Iman H
    1 hour ago










  • @ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
    – Lundin
    1 hour ago















Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago





Nice answer. What if I use two "strstr"s: find = strstr(buffer,"$GPTXT")+ strlen("$GPTXT"); find1= strstr(buffer,"$GPRMC"); strncpy(dst,find,(int)(find1-find)); dst[(int)(find1-find)] = '';
– Iman H
1 hour ago













@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago





@ImanH Well, if you need to use a second strstr that's another question, but it is very similar to this. You should however not use strncpy for any purpose, it is both slow and dangerous. memcpy is superior to it in every single way. See Which functions from the standard library must (should) be avoided?
– Lundin
1 hour ago













So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago




So I can solve the second problem I mentioned above, in this way.
– Iman H
1 hour ago












@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago




@ImanH Yeah sure, just replace the terminator parameter with a second search key parameter.
– Lundin
1 hour ago












up vote
1
down vote













While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT and '$' given that you have the "rn" prior to the terminating '$'. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-" extracted from the string (drop a comment if this is incorrect).



If this is the case, you can use the rn to your advantage in tokenizing the string to return that part between $GPTXT and "rn$" by using "rn$" as the delimiters passed to strtok. The only additional task needed in your findstr function would be to step past your search string before calling strtok.



In looking at the declaration for findstring, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest comes before src as in strcpy, etc... I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.



I would also change your termChar to a const char* parameter to allow flexibility in passing the delimiter to use with strtok.



An implementation might look like:



#include <stdio.h>
#include <string.h>

#define TERM 32
#define MAXC 1024
#define DELIM "rn$"

char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)

char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */

if (!p) /* validate or return NULL */
return NULL;

p += strlen (srch); /* step past search str */

/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */

return dest;


int main (void)

char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";

if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);



(note: since you have already scanned to length of your token to validate it is less than len, there is no need to scan again by using strcpy to effect the token copy to dest, simply using memcpy will provide an ever-so-slightly more efficient copy)



Example Use/Output



While provides:



$ ./bin/findstr
sub: '->SOME CODES HERE<-'


If you are looking to parse something slightly different, let me know.






share|improve this answer




















  • My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
    – Lundin
    1 hour ago











  • That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
    – David C. Rankin
    1 hour ago










  • In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
    – Iman H
    1 hour ago










  • And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
    – Iman H
    1 hour ago











  • You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
    – David C. Rankin
    1 hour ago















up vote
1
down vote













While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT and '$' given that you have the "rn" prior to the terminating '$'. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-" extracted from the string (drop a comment if this is incorrect).



If this is the case, you can use the rn to your advantage in tokenizing the string to return that part between $GPTXT and "rn$" by using "rn$" as the delimiters passed to strtok. The only additional task needed in your findstr function would be to step past your search string before calling strtok.



In looking at the declaration for findstring, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest comes before src as in strcpy, etc... I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.



I would also change your termChar to a const char* parameter to allow flexibility in passing the delimiter to use with strtok.



An implementation might look like:



#include <stdio.h>
#include <string.h>

#define TERM 32
#define MAXC 1024
#define DELIM "rn$"

char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)

char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */

if (!p) /* validate or return NULL */
return NULL;

p += strlen (srch); /* step past search str */

/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */

return dest;


int main (void)

char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";

if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);



(note: since you have already scanned to length of your token to validate it is less than len, there is no need to scan again by using strcpy to effect the token copy to dest, simply using memcpy will provide an ever-so-slightly more efficient copy)



Example Use/Output



While provides:



$ ./bin/findstr
sub: '->SOME CODES HERE<-'


If you are looking to parse something slightly different, let me know.






share|improve this answer




















  • My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
    – Lundin
    1 hour ago











  • That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
    – David C. Rankin
    1 hour ago










  • In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
    – Iman H
    1 hour ago










  • And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
    – Iman H
    1 hour ago











  • You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
    – David C. Rankin
    1 hour ago













up vote
1
down vote










up vote
1
down vote









While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT and '$' given that you have the "rn" prior to the terminating '$'. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-" extracted from the string (drop a comment if this is incorrect).



If this is the case, you can use the rn to your advantage in tokenizing the string to return that part between $GPTXT and "rn$" by using "rn$" as the delimiters passed to strtok. The only additional task needed in your findstr function would be to step past your search string before calling strtok.



In looking at the declaration for findstring, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest comes before src as in strcpy, etc... I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.



I would also change your termChar to a const char* parameter to allow flexibility in passing the delimiter to use with strtok.



An implementation might look like:



#include <stdio.h>
#include <string.h>

#define TERM 32
#define MAXC 1024
#define DELIM "rn$"

char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)

char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */

if (!p) /* validate or return NULL */
return NULL;

p += strlen (srch); /* step past search str */

/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */

return dest;


int main (void)

char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";

if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);



(note: since you have already scanned to length of your token to validate it is less than len, there is no need to scan again by using strcpy to effect the token copy to dest, simply using memcpy will provide an ever-so-slightly more efficient copy)



Example Use/Output



While provides:



$ ./bin/findstr
sub: '->SOME CODES HERE<-'


If you are looking to parse something slightly different, let me know.






share|improve this answer












While you have a good presentation of your question, it is still a bit unclear exactly what you are looking to extract between $GPTXT and '$' given that you have the "rn" prior to the terminating '$'. It seems unlikely that you would want to retain the carriage return/line feed as part of your returned substring. My best interpretation is that you want "->SOME CODES HERE<-" extracted from the string (drop a comment if this is incorrect).



If this is the case, you can use the rn to your advantage in tokenizing the string to return that part between $GPTXT and "rn$" by using "rn$" as the delimiters passed to strtok. The only additional task needed in your findstr function would be to step past your search string before calling strtok.



In looking at the declaration for findstring, I would tweak the parameters just a bit to make them consistent with most of the other string library functions. That being to reverse the source and dest parameters such that dest comes before src as in strcpy, etc... I'm not a fan of one way over the other, but I have found keeping functions at least as consistent as possible helps avoid inadvertent parameter swaps.



I would also change your termChar to a const char* parameter to allow flexibility in passing the delimiter to use with strtok.



An implementation might look like:



#include <stdio.h>
#include <string.h>

#define TERM 32
#define MAXC 1024
#define DELIM "rn$"

char *findstr (char *dest, const char *src, size_t len, const char *srch,
const char *delim)

char *p = strstr (src, srch); /* locate beginning of search in src */
size_t toklen = 0; /* token length to calculate once */

if (!p) /* validate or return NULL */
return NULL;

p += strlen (srch); /* step past search str */

/* tokenize p based on delim and validate length < len */
if ((p = strtok (p, delim)) && (toklen = strlen(p)) < len)
memcpy (dest, p, toklen+1); /* copy to dest w/nul-char */
else
return NULL; /* or return NULL */

return dest;


int main (void)

char buffer = "blablabla$GPTXT->SOME CODES HERE<-rn$GPRMCblablabla";
char sub[MAXC] = "";

if (findstr (sub, buffer, MAXC, "GPTXT", DELIM))
printf ("sub: '%s'n", sub);



(note: since you have already scanned to length of your token to validate it is less than len, there is no need to scan again by using strcpy to effect the token copy to dest, simply using memcpy will provide an ever-so-slightly more efficient copy)



Example Use/Output



While provides:



$ ./bin/findstr
sub: '->SOME CODES HERE<-'


If you are looking to parse something slightly different, let me know.







share|improve this answer












share|improve this answer



share|improve this answer










answered 1 hour ago









David C. Rankin

19617




19617











  • My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
    – Lundin
    1 hour ago











  • That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
    – David C. Rankin
    1 hour ago










  • In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
    – Iman H
    1 hour ago










  • And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
    – Iman H
    1 hour ago











  • You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
    – David C. Rankin
    1 hour ago

















  • My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
    – Lundin
    1 hour ago











  • That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
    – David C. Rankin
    1 hour ago










  • In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
    – Iman H
    1 hour ago










  • And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
    – Iman H
    1 hour ago











  • You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
    – David C. Rankin
    1 hour ago
















My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago





My take is that the input could either come from another device ("AT commands") or from a RS-232 terminal PC tool. In the latter case the CR+LF, or just LF, might be appended mostly as an annoying side-effect. I guess in that case they should just be discarded, rather than to use as part of the search key. Because another device is not likely to send them, unless it is part of some protocol specification.
– Lundin
1 hour ago













That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago




That makes the most sense. I was scratching my head on the reason they would be embedded (and cussing windoze at the same time). I guess they could be read from some port, but that raises the issue of whether an accumulator buffer should be employed to insure a complete line is read and not just the first-half, etc..
– David C. Rankin
1 hour ago












In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago




In my special use case, the CRLF must be always sent . So the format of sentences are always "$GPxxx______rn". But in some cases the sender, unfortunately randomly, does not send CR or LF.
– Iman H
1 hour ago












And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago





And, I can implement an interrupt method for UART receiving to parse the input while getting data. But such calculations are not suggested in interrupt routine.
– Iman H
1 hour ago













You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago





You are still covered with strtok() regardless of whether "rn" is sent, so long as you include '$' as part of the delimiter as well. strtok() will parse the token based on the first delimiter located. But yes, in an interrupt you would not want to rely on a returned value. The type can be change by to int without any loss of functionality and since dest is filled within the function there is no need for a return.
– David C. Rankin
1 hour ago











Iman H is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Iman H is a new contributor. Be nice, and check out our Code of Conduct.












Iman H is a new contributor. Be nice, and check out our Code of Conduct.











Iman H is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f206026%2fextracting-a-specific-substring-in-c%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery