How should source code security be checked?

up vote
4
down vote

favorite

How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling rm on arbitrary files).

Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesnÃ¢Â€Â™t sound good to blindly use an arbitrary library hosted on GitHub).

Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.

As such, IÃ¢Â€Â™m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

There are static code analysers. Have you looked into those tools?
â€“Â schroederâ™¦
1 hour ago

Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
â€“Â tonychow0929
1 hour ago

1

SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
â€“Â odo
57 mins ago

for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
â€“Â J. Doe
42 mins ago

Not quite. IÃ¢Â€Â™m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
â€“Â tonychow0929
31 mins ago

add a commentÂ |Â

up vote
4
down vote

favorite

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

There are static code analysers. Have you looked into those tools?
â€“Â schroederâ™¦
1 hour ago

Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
â€“Â tonychow0929
1 hour ago

1

SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
â€“Â odo
57 mins ago

for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
â€“Â J. Doe
42 mins ago

Not quite. IÃ¢Â€Â™m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
â€“Â tonychow0929
31 mins ago

add a commentÂ |Â

up vote
4
down vote

favorite

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

malware source-code tools

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

edited 1 hour ago

schroederâ™¦

68k25143181

edited 1 hour ago

schroederâ™¦

68k25143181

edited 1 hour ago

schroederâ™¦

68k25143181

asked 1 hour ago

tonychow0929

1,1063711

asked 1 hour ago

tonychow0929

1,1063711

asked 1 hour ago

tonychow0929

1,1063711

There are static code analysers. Have you looked into those tools?
â€“Â schroederâ™¦
1 hour ago

Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
â€“Â tonychow0929
1 hour ago

1

SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
â€“Â odo
57 mins ago

for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
â€“Â J. Doe
42 mins ago

Not quite. IÃ¢Â€Â™m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
â€“Â tonychow0929
31 mins ago

add a commentÂ |Â

There are static code analysers. Have you looked into those tools?
â€“Â schroederâ™¦
1 hour ago

Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
â€“Â tonychow0929
1 hour ago

1

SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
â€“Â odo
57 mins ago

for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
â€“Â J. Doe
42 mins ago

Not quite. IÃ¢Â€Â™m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
â€“Â tonychow0929
31 mins ago

There are static code analysers. Have you looked into those tools?
â€“Â schroederâ™¦
1 hour ago

Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents.
â€“Â tonychow0929
1 hour ago

SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering.
â€“Â odo
57 mins ago

for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem?
â€“Â J. Doe
42 mins ago

Not quite. IÃ¢Â€Â™m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc
â€“Â tonychow0929
31 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.

Of course it's better to combine them both.

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

add a commentÂ |Â

up vote
1
down vote

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "162"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f196455%2fhow-should-source-code-security-be-checked%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

Of course it's better to combine them both.

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

add a commentÂ |Â

up vote
2
down vote

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

Of course it's better to combine them both.

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

add a commentÂ |Â

up vote
2
down vote

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

Of course it's better to combine them both.

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

Of course it's better to combine them both.

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

edited 46 mins ago

answered 1 hour ago

odo

1392

New contributor

answered 1 hour ago

odo

1392

answered 1 hour ago

odo

1392

New contributor

odo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

up vote
1
down vote

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

add a commentÂ |Â

up vote
1
down vote

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

add a commentÂ |Â

up vote
1
down vote

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

edited 27 mins ago

answered 59 mins ago

symcbean

15.5k3066

answered 59 mins ago

symcbean

15.5k3066

answered 59 mins ago

symcbean

15.5k3066

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky