continuous integration for scientific software

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I'm no software engineer.
I'm a phd student in the field of geoscience.



Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn't know it exists and I was the only person working on this software.



Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I'm scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.



Because of this two reasons, I'm now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.



I have a couple of questions where I would like to get some advice:



First of all a short explanation of how the software works:



  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.


  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.


Now let's come to my questions:



  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.


  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)


  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?


Any other advice with CI is highly appreciated.










share|improve this question









New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • How to handle a question that asks many things
    – gnat
    1 hour ago










  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    44 mins ago














up vote
2
down vote

favorite












I'm no software engineer.
I'm a phd student in the field of geoscience.



Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn't know it exists and I was the only person working on this software.



Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I'm scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.



Because of this two reasons, I'm now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.



I have a couple of questions where I would like to get some advice:



First of all a short explanation of how the software works:



  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.


  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.


Now let's come to my questions:



  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.


  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)


  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?


Any other advice with CI is highly appreciated.










share|improve this question









New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • How to handle a question that asks many things
    – gnat
    1 hour ago










  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    44 mins ago












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I'm no software engineer.
I'm a phd student in the field of geoscience.



Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn't know it exists and I was the only person working on this software.



Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I'm scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.



Because of this two reasons, I'm now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.



I have a couple of questions where I would like to get some advice:



First of all a short explanation of how the software works:



  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.


  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.


Now let's come to my questions:



  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.


  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)


  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?


Any other advice with CI is highly appreciated.










share|improve this question









New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm no software engineer.
I'm a phd student in the field of geoscience.



Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn't know it exists and I was the only person working on this software.



Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I'm scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.



Because of this two reasons, I'm now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.



I have a couple of questions where I would like to get some advice:



First of all a short explanation of how the software works:



  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.


  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.


Now let's come to my questions:



  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.


  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)


  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?


Any other advice with CI is highly appreciated.







object-oriented c++ testing continuous-integration






share|improve this question









New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago









amon

78.3k19149234




78.3k19149234






New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









user7431005

1142




1142




New contributor




user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • How to handle a question that asks many things
    – gnat
    1 hour ago










  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    44 mins ago
















  • How to handle a question that asks many things
    – gnat
    1 hour ago










  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    44 mins ago















How to handle a question that asks many things
– gnat
1 hour ago




How to handle a question that asks many things
– gnat
1 hour ago












How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
– Bart van Ingen Schenau
44 mins ago




How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
– Bart van Ingen Schenau
44 mins ago










3 Answers
3






active

oldest

votes

















up vote
2
down vote













Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.



Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.



  • It is easy to forget this e.g. when using C's rand() function which depends on global state.

  • Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

  • Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.



  • As a minimum quality level “it doesn't crash” can already be a good test result.

  • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

  • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.



Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.



Even a few tests are better than no tests. Combined with the check “it has to compile” that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.



Manual testing:
Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.






share|improve this answer



























    up vote
    1
    down vote














    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.



      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.



    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.


    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.


    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.


    Some tips:



    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.





    share|improve this answer








    New contributor




    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.
























      up vote
      1
      down vote














      1. Types of test





        • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented




          Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?



          It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.



          Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.



        • End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.


        • Yes, adding CI to existing software is sensible, and normal.




      2. How to write unit tests



        If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.



        You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.




      3. Variable results



        You must have some invariants for the result. Test those, rather than a single numerical value.



        You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.







      share|improve this answer




















        Your Answer







        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "131"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );






        user7431005 is a new contributor. Be nice, and check out our Code of Conduct.









         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f379996%2fcontinuous-integration-for-scientific-software%23new-answer', 'question_page');

        );

        Post as a guest






























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        2
        down vote













        Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.



        Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.



        • It is easy to forget this e.g. when using C's rand() function which depends on global state.

        • Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

        • Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

        Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.



        • As a minimum quality level “it doesn't crash” can already be a good test result.

        • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

        • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

        When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.



        Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.



        Even a few tests are better than no tests. Combined with the check “it has to compile” that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.



        Manual testing:
        Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.






        share|improve this answer
























          up vote
          2
          down vote













          Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.



          Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.



          • It is easy to forget this e.g. when using C's rand() function which depends on global state.

          • Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

          • Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

          Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.



          • As a minimum quality level “it doesn't crash” can already be a good test result.

          • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

          • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

          When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.



          Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.



          Even a few tests are better than no tests. Combined with the check “it has to compile” that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.



          Manual testing:
          Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.






          share|improve this answer






















            up vote
            2
            down vote










            up vote
            2
            down vote









            Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.



            Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.



            • It is easy to forget this e.g. when using C's rand() function which depends on global state.

            • Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

            • Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

            Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.



            • As a minimum quality level “it doesn't crash” can already be a good test result.

            • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

            • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

            When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.



            Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.



            Even a few tests are better than no tests. Combined with the check “it has to compile” that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.



            Manual testing:
            Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.






            share|improve this answer












            Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.



            Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.



            • It is easy to forget this e.g. when using C's rand() function which depends on global state.

            • Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

            • Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

            Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.



            • As a minimum quality level “it doesn't crash” can already be a good test result.

            • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

            • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

            When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.



            Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.



            Even a few tests are better than no tests. Combined with the check “it has to compile” that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.



            Manual testing:
            Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 31 mins ago









            amon

            78.3k19149234




            78.3k19149234






















                up vote
                1
                down vote














                1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.



                  When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.



                2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.


                3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.


                4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.


                Some tips:



                • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

                • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.





                share|improve this answer








                New contributor




                AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





















                  up vote
                  1
                  down vote














                  1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.



                    When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.



                  2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.


                  3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.


                  4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.


                  Some tips:



                  • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

                  • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.





                  share|improve this answer








                  New contributor




                  AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.



















                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote










                    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.



                      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.



                    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.


                    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.


                    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.


                    Some tips:



                    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

                    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.





                    share|improve this answer








                    New contributor




                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.










                    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.



                      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.



                    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.


                    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.


                    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.


                    Some tips:



                    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

                    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.






                    share|improve this answer








                    New contributor




                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    share|improve this answer



                    share|improve this answer






                    New contributor




                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    answered 1 hour ago









                    AWhitePelican

                    191




                    191




                    New contributor




                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.





                    New contributor





                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.






                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.




















                        up vote
                        1
                        down vote














                        1. Types of test





                          • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented




                            Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?



                            It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.



                            Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.



                          • End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.


                          • Yes, adding CI to existing software is sensible, and normal.




                        2. How to write unit tests



                          If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.



                          You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.




                        3. Variable results



                          You must have some invariants for the result. Test those, rather than a single numerical value.



                          You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.







                        share|improve this answer
























                          up vote
                          1
                          down vote














                          1. Types of test





                            • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented




                              Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?



                              It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.



                              Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.



                            • End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.


                            • Yes, adding CI to existing software is sensible, and normal.




                          2. How to write unit tests



                            If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.



                            You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.




                          3. Variable results



                            You must have some invariants for the result. Test those, rather than a single numerical value.



                            You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.







                          share|improve this answer






















                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote










                            1. Types of test





                              • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented




                                Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?



                                It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.



                                Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.



                              • End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.


                              • Yes, adding CI to existing software is sensible, and normal.




                            2. How to write unit tests



                              If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.



                              You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.




                            3. Variable results



                              You must have some invariants for the result. Test those, rather than a single numerical value.



                              You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.







                            share|improve this answer













                            1. Types of test





                              • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented




                                Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?



                                It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.



                                Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.



                              • End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.


                              • Yes, adding CI to existing software is sensible, and normal.




                            2. How to write unit tests



                              If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.



                              You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.




                            3. Variable results



                              You must have some invariants for the result. Test those, rather than a single numerical value.



                              You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.








                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 1 hour ago









                            Useless

                            8,33921633




                            8,33921633




















                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.









                                 

                                draft saved


                                draft discarded


















                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.












                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.











                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.













                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f379996%2fcontinuous-integration-for-scientific-software%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Comments

                                Popular posts from this blog

                                Long meetings (6-7 hours a day): Being “babysat” by supervisor

                                Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                                Confectionery