How should source code security be checked?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
2












How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling rm on arbitrary files).



Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesn’t sound good to blindly use an arbitrary library hosted on GitHub).



Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.



As such, I’m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?










share|improve this question























  • There are static code analysers. Have you looked into those tools?
    – schroeder♦
    1 hour ago










  • Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
    – tonychow0929
    1 hour ago






  • 1




    SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
    – odo
    57 mins ago











  • for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
    – J. Doe
    42 mins ago











  • Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
    – tonychow0929
    31 mins ago














up vote
4
down vote

favorite
2












How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling rm on arbitrary files).



Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesn’t sound good to blindly use an arbitrary library hosted on GitHub).



Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.



As such, I’m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?










share|improve this question























  • There are static code analysers. Have you looked into those tools?
    – schroeder♦
    1 hour ago










  • Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
    – tonychow0929
    1 hour ago






  • 1




    SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
    – odo
    57 mins ago











  • for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
    – J. Doe
    42 mins ago











  • Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
    – tonychow0929
    31 mins ago












up vote
4
down vote

favorite
2









up vote
4
down vote

favorite
2






2





How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling rm on arbitrary files).



Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesn’t sound good to blindly use an arbitrary library hosted on GitHub).



Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.



As such, I’m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?










share|improve this question















How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling rm on arbitrary files).



Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesn’t sound good to blindly use an arbitrary library hosted on GitHub).



Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.



As such, I’m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?







malware source-code tools






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









schroeder♦

68k25143181




68k25143181










asked 1 hour ago









tonychow0929

1,1063711




1,1063711











  • There are static code analysers. Have you looked into those tools?
    – schroeder♦
    1 hour ago










  • Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
    – tonychow0929
    1 hour ago






  • 1




    SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
    – odo
    57 mins ago











  • for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
    – J. Doe
    42 mins ago











  • Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
    – tonychow0929
    31 mins ago
















  • There are static code analysers. Have you looked into those tools?
    – schroeder♦
    1 hour ago










  • Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
    – tonychow0929
    1 hour ago






  • 1




    SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
    – odo
    57 mins ago











  • for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
    – J. Doe
    42 mins ago











  • Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
    – tonychow0929
    31 mins ago















There are static code analysers. Have you looked into those tools?
– schroeder♦
1 hour ago




There are static code analysers. Have you looked into those tools?
– schroeder♦
1 hour ago












Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
– tonychow0929
1 hour ago




Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
– tonychow0929
1 hour ago




1




1




SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
– odo
57 mins ago





SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
– odo
57 mins ago













for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
– J. Doe
42 mins ago





for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
– J. Doe
42 mins ago













Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
– tonychow0929
31 mins ago




Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
– tonychow0929
31 mins ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote













There are automated and manual approaches.



For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.



For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.



Of course it's better to combine them both.






share|improve this answer










New contributor




odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    1
    down vote













    If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.



    For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.



    In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.



    Note that these are only intended to protect against a MITM type attack on the application.




    use an arbitrary library hosted on GitHub




    ...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.



    None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.



    Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.






    share|improve this answer






















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "162"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f196455%2fhow-should-source-code-security-be-checked%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote













      There are automated and manual approaches.



      For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.



      For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.



      Of course it's better to combine them both.






      share|improve this answer










      New contributor




      odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





















        up vote
        2
        down vote













        There are automated and manual approaches.



        For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.



        For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.



        Of course it's better to combine them both.






        share|improve this answer










        New contributor




        odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.



















          up vote
          2
          down vote










          up vote
          2
          down vote









          There are automated and manual approaches.



          For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.



          For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.



          Of course it's better to combine them both.






          share|improve this answer










          New contributor




          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          There are automated and manual approaches.



          For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.



          For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.



          Of course it's better to combine them both.







          share|improve this answer










          New contributor




          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited 46 mins ago





















          New contributor




          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 1 hour ago









          odo

          1392




          1392




          New contributor




          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






















              up vote
              1
              down vote













              If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.



              For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.



              In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.



              Note that these are only intended to protect against a MITM type attack on the application.




              use an arbitrary library hosted on GitHub




              ...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.



              None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.



              Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.






              share|improve this answer


























                up vote
                1
                down vote













                If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.



                For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.



                In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.



                Note that these are only intended to protect against a MITM type attack on the application.




                use an arbitrary library hosted on GitHub




                ...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.



                None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.



                Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.






                share|improve this answer
























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.



                  For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.



                  In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.



                  Note that these are only intended to protect against a MITM type attack on the application.




                  use an arbitrary library hosted on GitHub




                  ...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.



                  None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.



                  Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.






                  share|improve this answer














                  If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.



                  For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.



                  In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.



                  Note that these are only intended to protect against a MITM type attack on the application.




                  use an arbitrary library hosted on GitHub




                  ...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.



                  None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.



                  Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 27 mins ago

























                  answered 59 mins ago









                  symcbean

                  15.5k3066




                  15.5k3066



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f196455%2fhow-should-source-code-security-be-checked%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      Long meetings (6-7 hours a day): Being “babysat” by supervisor

                      Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                      Confectionery