Iyfjky

Question

I'm no software engineer.
I'm a phd student in the field of geoscience.

Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn't know it exists and I was the only person working on this software.

Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I'm scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.

Because of this two reasons, I'm now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.

I have a couple of questions where I would like to get some advice:

First of all a short explanation of how the software works:

The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.

It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.

Now let's come to my questions:

unit tests, integration tests, end-to-end tests?:
My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.

How do you write unit tests if the function parameters are difficult to create?
assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)

How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?

Any other advice with CI is highly appreciated.

How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project. — 44 mins ago

amon 78.3k19149234 · Answer 1 · 2018-10-14 17:40:50Z

Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to Ã¢Â€ÂœnormalÃ¢Â€Â software is not whether tests are useful (yes!), but which kinds of test are appropriate.

Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

It is easy to forget this e.g. when using C's rand() function which depends on global state.

Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

As a minimum quality level Ã¢Â€Âœit doesn't crashÃ¢Â€Â can already be a good test result.

For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (Ã¢Â€Âœselected locations must be at least 2km apartÃ¢Â€Â) or scenario-specific, e.g. Ã¢Â€Âœa selected location must be within this areaÃ¢Â€Â.

When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

Even a few tests are better than no tests. Combined with the check Ã¢Â€Âœit has to compileÃ¢Â€Â that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

Manual testing:
Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

AWhitePelican 191 · Answer 2 · 2018-10-14 16:52:55Z

It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

Some tips:

Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.

Useless 8,33921633 · Answer 3 · 2018-10-14 17:06:02Z

Types of test
- It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented
  
  Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?
  
  It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.
  
  Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.
- End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.
- Yes, adding CI to existing software is sensible, and normal.

How to write unit tests

If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.

Variable results

You must have some invariants for the result. Test those, rather than a single numerical value.

You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.

amon 78.3k19149234 · Answer 4 · 2018-10-14 17:40:50Z

Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn't usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to Ã¢Â€ÂœnormalÃ¢Â€Â software is not whether tests are useful (yes!), but which kinds of test are appropriate.

Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

It is easy to forget this e.g. when using C's rand() function which depends on global state.

Ideally, a random number generator is passed as an explicit object through your functions. C++11's random standard library header makes this a lot easier.

Instead of sharing random state across modules of the software, I've found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

As a minimum quality level Ã¢Â€Âœit doesn't crashÃ¢Â€Â can already be a good test result.

For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.

If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (Ã¢Â€Âœselected locations must be at least 2km apartÃ¢Â€Â) or scenario-specific, e.g. Ã¢Â€Âœa selected location must be within this areaÃ¢Â€Â.

When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I've found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

Even a few tests are better than no tests. Combined with the check Ã¢Â€Âœit has to compileÃ¢Â€Â that's already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

Manual testing:
Especially for complex problem domains, you will not be able to test everything automatically. E.g. I'm currently working on a stochastic search problem. If I test that my software always produces the same result, I can't improve it without breaking the tests. Instead, I've made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

AWhitePelican 191 · Answer 5 · 2018-10-14 16:52:55Z

It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

Some tips:

Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.

stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.

Useless 8,33921633 · Answer 6 · 2018-10-14 17:06:02Z

Types of test
- It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented
  
  Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?
  
  It's much easier to write unit tests for individual functions than for the whole program. It's much easier to be sure you have good coverage of an individual function. It's much easier to refactor a function when you're sure the unit tests will catch any corner cases you broke.
  
  Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They're a good way to confirm your understanding of the functions in the first place and, once written, they're a good way to find unexpected changes of behaviour.
- End-to-end tests are also worthwhile. If they're easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you're most concerned about others breaking. You don't have to do it all at once.
- Yes, adding CI to existing software is sensible, and normal.

How to write unit tests

If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

You should anyway have some easy way of creating instances - a function to create dummy instances is common - but having tests for the real creation process is also sensible.

Variable results

You must have some invariants for the result. Test those, rather than a single numerical value.

You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it's brittle unless it literally returns the same number every time.

Search This Blog

Iyfjky

continuous integration for scientific software

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

Category

Random preview

continuous integration for scientific software

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

3 Answers
3

3 Answers
3

3 Answers
3