How do you deal with “bugs” that can never be reproduced

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












Suppose your team writes a software system that is running fine.



One day one of the engineers mistakenly runs some SQL queries that change some of the DB data, then forgets about it.



After some time you discover the corrupted/erroneous data and everyone scratches their heads as to which part of the code caused this and why, to no avail. Meanwhile the project manager insists that we find the part of the code that caused it.



How do you deal with this?










share|improve this question





















  • If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
    – DaveG
    55 mins ago










  • He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
    – Nicholas Kyriakides
    53 mins ago











  • So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
    – DaveG
    48 mins ago











  • This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
    – Nicholas Kyriakides
    46 mins ago














up vote
2
down vote

favorite












Suppose your team writes a software system that is running fine.



One day one of the engineers mistakenly runs some SQL queries that change some of the DB data, then forgets about it.



After some time you discover the corrupted/erroneous data and everyone scratches their heads as to which part of the code caused this and why, to no avail. Meanwhile the project manager insists that we find the part of the code that caused it.



How do you deal with this?










share|improve this question





















  • If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
    – DaveG
    55 mins ago










  • He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
    – Nicholas Kyriakides
    53 mins ago











  • So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
    – DaveG
    48 mins ago











  • This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
    – Nicholas Kyriakides
    46 mins ago












up vote
2
down vote

favorite









up vote
2
down vote

favorite











Suppose your team writes a software system that is running fine.



One day one of the engineers mistakenly runs some SQL queries that change some of the DB data, then forgets about it.



After some time you discover the corrupted/erroneous data and everyone scratches their heads as to which part of the code caused this and why, to no avail. Meanwhile the project manager insists that we find the part of the code that caused it.



How do you deal with this?










share|improve this question













Suppose your team writes a software system that is running fine.



One day one of the engineers mistakenly runs some SQL queries that change some of the DB data, then forgets about it.



After some time you discover the corrupted/erroneous data and everyone scratches their heads as to which part of the code caused this and why, to no avail. Meanwhile the project manager insists that we find the part of the code that caused it.



How do you deal with this?







project-management






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 hours ago









Nicholas Kyriakides

433220




433220











  • If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
    – DaveG
    55 mins ago










  • He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
    – Nicholas Kyriakides
    53 mins ago











  • So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
    – DaveG
    48 mins ago











  • This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
    – Nicholas Kyriakides
    46 mins ago
















  • If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
    – DaveG
    55 mins ago










  • He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
    – Nicholas Kyriakides
    53 mins ago











  • So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
    – DaveG
    48 mins ago











  • This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
    – Nicholas Kyriakides
    46 mins ago















If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
– DaveG
55 mins ago




If the engineer forgot about it, how do you know that's what happened? How do you it was corrupted by someone running a script, and not by a bug?
– DaveG
55 mins ago












He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
– Nicholas Kyriakides
53 mins ago





He had an epiphany after a day or two. This is a hypothetical in case he never did remember which could have easily been the case.
– Nicholas Kyriakides
53 mins ago













So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
– DaveG
48 mins ago





So in this case, is the project manager still thinking this is due to a bug, despite one of the engineers saying it's due to a manual SQL query? Or does the manager just want better error checking & procedures to try to catch this sort of thing before it's a problem?
– DaveG
48 mins ago













This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
– Nicholas Kyriakides
46 mins ago




This is a hypothetical. I'm sure the PM would have us chase this is as much as we can if he never did remember. I know I would.
– Nicholas Kyriakides
46 mins ago










3 Answers
3






active

oldest

votes

















up vote
6
down vote













It is obvious no project manager will invest an infinite amount of time into such a problem. What they want is to prevent happening the same situation again.



To achieve this goal, even if one cannot find the root cause of such a failure, it is often possible to take some measures for



  • detecting such kind of failure earlier in case they reoccure

  • making it less likely the same failure will happen again

  • making the system more robust against the specific kind of inconsistency

For example, more detailed logging, more finegrained error handling, immediate error signaling could help to prevent the same error to strike again, or to find the root cause. If your system allows to add database triggers, maybe it is possible to add a trigger which forbids the inconsistency to be introduced in the first place.



Think of what the appropriate kind of action might be applicable in your situation, and suggest this to the team, I am sure your project manager will be pleased.






share|improve this answer






















  • Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
    – Nicholas Kyriakides
    59 mins ago

















up vote
2
down vote













A production database should have full access logging and role based access controls. Thus you should have hard evidence as to WHO did WHAT WHEN to the database thus moving the attention from the code to poor operational security.






share|improve this answer








New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • 1




    It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
    – Nathanael
    1 hour ago

















up vote
0
down vote













  1. Explain to your project manager that you think the most likely cause is manual database access.

  2. If they still want you to look for the code that caused this, go and have another look at the code.

  3. Come back in a couple of hours (or some other appropriate time) and say you can't find any code which would have caused this, therefore you still believe the most likely cause is manual database access.

  4. If they still want you to look for the code, ask how much time they would like you to spend on this. Subtly remind them that you won't be working on feature X, bug Y or enhancement Z while you're doing this.

  5. Spend as much time as they ask. If you still think the most likely cause is manual database access, tell them this.

  6. If they still want you to look for the code, escalate the issue as this has clearly become an unproductive use of your team's time.

You may also want to consider if you should add in an extra processes to reduce the likelihood of manual database access causing this kind of issue in future.






share|improve this answer




















  • I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
    – Nicholas Kyriakides
    1 hour ago











  • "My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
    – esoterik
    29 mins ago










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "131"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f380579%2fhow-do-you-deal-with-bugs-that-can-never-be-reproduced%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
6
down vote













It is obvious no project manager will invest an infinite amount of time into such a problem. What they want is to prevent happening the same situation again.



To achieve this goal, even if one cannot find the root cause of such a failure, it is often possible to take some measures for



  • detecting such kind of failure earlier in case they reoccure

  • making it less likely the same failure will happen again

  • making the system more robust against the specific kind of inconsistency

For example, more detailed logging, more finegrained error handling, immediate error signaling could help to prevent the same error to strike again, or to find the root cause. If your system allows to add database triggers, maybe it is possible to add a trigger which forbids the inconsistency to be introduced in the first place.



Think of what the appropriate kind of action might be applicable in your situation, and suggest this to the team, I am sure your project manager will be pleased.






share|improve this answer






















  • Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
    – Nicholas Kyriakides
    59 mins ago














up vote
6
down vote













It is obvious no project manager will invest an infinite amount of time into such a problem. What they want is to prevent happening the same situation again.



To achieve this goal, even if one cannot find the root cause of such a failure, it is often possible to take some measures for



  • detecting such kind of failure earlier in case they reoccure

  • making it less likely the same failure will happen again

  • making the system more robust against the specific kind of inconsistency

For example, more detailed logging, more finegrained error handling, immediate error signaling could help to prevent the same error to strike again, or to find the root cause. If your system allows to add database triggers, maybe it is possible to add a trigger which forbids the inconsistency to be introduced in the first place.



Think of what the appropriate kind of action might be applicable in your situation, and suggest this to the team, I am sure your project manager will be pleased.






share|improve this answer






















  • Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
    – Nicholas Kyriakides
    59 mins ago












up vote
6
down vote










up vote
6
down vote









It is obvious no project manager will invest an infinite amount of time into such a problem. What they want is to prevent happening the same situation again.



To achieve this goal, even if one cannot find the root cause of such a failure, it is often possible to take some measures for



  • detecting such kind of failure earlier in case they reoccure

  • making it less likely the same failure will happen again

  • making the system more robust against the specific kind of inconsistency

For example, more detailed logging, more finegrained error handling, immediate error signaling could help to prevent the same error to strike again, or to find the root cause. If your system allows to add database triggers, maybe it is possible to add a trigger which forbids the inconsistency to be introduced in the first place.



Think of what the appropriate kind of action might be applicable in your situation, and suggest this to the team, I am sure your project manager will be pleased.






share|improve this answer














It is obvious no project manager will invest an infinite amount of time into such a problem. What they want is to prevent happening the same situation again.



To achieve this goal, even if one cannot find the root cause of such a failure, it is often possible to take some measures for



  • detecting such kind of failure earlier in case they reoccure

  • making it less likely the same failure will happen again

  • making the system more robust against the specific kind of inconsistency

For example, more detailed logging, more finegrained error handling, immediate error signaling could help to prevent the same error to strike again, or to find the root cause. If your system allows to add database triggers, maybe it is possible to add a trigger which forbids the inconsistency to be introduced in the first place.



Think of what the appropriate kind of action might be applicable in your situation, and suggest this to the team, I am sure your project manager will be pleased.







share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 1 hour ago









Doc Brown

127k21232367




127k21232367











  • Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
    – Nicholas Kyriakides
    59 mins ago
















  • Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
    – Nicholas Kyriakides
    59 mins ago















Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
– Nicholas Kyriakides
59 mins ago




Is there an established procedure that you're giving me an overview here or is this just based on experience/common-sense?
– Nicholas Kyriakides
59 mins ago












up vote
2
down vote













A production database should have full access logging and role based access controls. Thus you should have hard evidence as to WHO did WHAT WHEN to the database thus moving the attention from the code to poor operational security.






share|improve this answer








New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • 1




    It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
    – Nathanael
    1 hour ago














up vote
2
down vote













A production database should have full access logging and role based access controls. Thus you should have hard evidence as to WHO did WHAT WHEN to the database thus moving the attention from the code to poor operational security.






share|improve this answer








New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • 1




    It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
    – Nathanael
    1 hour ago












up vote
2
down vote










up vote
2
down vote









A production database should have full access logging and role based access controls. Thus you should have hard evidence as to WHO did WHAT WHEN to the database thus moving the attention from the code to poor operational security.






share|improve this answer








New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









A production database should have full access logging and role based access controls. Thus you should have hard evidence as to WHO did WHAT WHEN to the database thus moving the attention from the code to poor operational security.







share|improve this answer








New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 1 hour ago









Don Gilman

291




291




New contributor




Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Don Gilman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1




    It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
    – Nathanael
    1 hour ago












  • 1




    It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
    – Nathanael
    1 hour ago







1




1




It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
– Nathanael
1 hour ago




It sounds like they may not know exactly when the data corruption occurred, which could make it difficult to figure out what logs they need to investigate.
– Nathanael
1 hour ago










up vote
0
down vote













  1. Explain to your project manager that you think the most likely cause is manual database access.

  2. If they still want you to look for the code that caused this, go and have another look at the code.

  3. Come back in a couple of hours (or some other appropriate time) and say you can't find any code which would have caused this, therefore you still believe the most likely cause is manual database access.

  4. If they still want you to look for the code, ask how much time they would like you to spend on this. Subtly remind them that you won't be working on feature X, bug Y or enhancement Z while you're doing this.

  5. Spend as much time as they ask. If you still think the most likely cause is manual database access, tell them this.

  6. If they still want you to look for the code, escalate the issue as this has clearly become an unproductive use of your team's time.

You may also want to consider if you should add in an extra processes to reduce the likelihood of manual database access causing this kind of issue in future.






share|improve this answer




















  • I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
    – Nicholas Kyriakides
    1 hour ago











  • "My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
    – esoterik
    29 mins ago














up vote
0
down vote













  1. Explain to your project manager that you think the most likely cause is manual database access.

  2. If they still want you to look for the code that caused this, go and have another look at the code.

  3. Come back in a couple of hours (or some other appropriate time) and say you can't find any code which would have caused this, therefore you still believe the most likely cause is manual database access.

  4. If they still want you to look for the code, ask how much time they would like you to spend on this. Subtly remind them that you won't be working on feature X, bug Y or enhancement Z while you're doing this.

  5. Spend as much time as they ask. If you still think the most likely cause is manual database access, tell them this.

  6. If they still want you to look for the code, escalate the issue as this has clearly become an unproductive use of your team's time.

You may also want to consider if you should add in an extra processes to reduce the likelihood of manual database access causing this kind of issue in future.






share|improve this answer




















  • I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
    – Nicholas Kyriakides
    1 hour ago











  • "My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
    – esoterik
    29 mins ago












up vote
0
down vote










up vote
0
down vote









  1. Explain to your project manager that you think the most likely cause is manual database access.

  2. If they still want you to look for the code that caused this, go and have another look at the code.

  3. Come back in a couple of hours (or some other appropriate time) and say you can't find any code which would have caused this, therefore you still believe the most likely cause is manual database access.

  4. If they still want you to look for the code, ask how much time they would like you to spend on this. Subtly remind them that you won't be working on feature X, bug Y or enhancement Z while you're doing this.

  5. Spend as much time as they ask. If you still think the most likely cause is manual database access, tell them this.

  6. If they still want you to look for the code, escalate the issue as this has clearly become an unproductive use of your team's time.

You may also want to consider if you should add in an extra processes to reduce the likelihood of manual database access causing this kind of issue in future.






share|improve this answer












  1. Explain to your project manager that you think the most likely cause is manual database access.

  2. If they still want you to look for the code that caused this, go and have another look at the code.

  3. Come back in a couple of hours (or some other appropriate time) and say you can't find any code which would have caused this, therefore you still believe the most likely cause is manual database access.

  4. If they still want you to look for the code, ask how much time they would like you to spend on this. Subtly remind them that you won't be working on feature X, bug Y or enhancement Z while you're doing this.

  5. Spend as much time as they ask. If you still think the most likely cause is manual database access, tell them this.

  6. If they still want you to look for the code, escalate the issue as this has clearly become an unproductive use of your team's time.

You may also want to consider if you should add in an extra processes to reduce the likelihood of manual database access causing this kind of issue in future.







share|improve this answer












share|improve this answer



share|improve this answer










answered 1 hour ago









Philip Kendall

4,67811824




4,67811824











  • I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
    – Nicholas Kyriakides
    1 hour ago











  • "My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
    – esoterik
    29 mins ago
















  • I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
    – Nicholas Kyriakides
    1 hour ago











  • "My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
    – esoterik
    29 mins ago















I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
– Nicholas Kyriakides
1 hour ago





I had no idea one of the engineers did a manual update + engineers almost never run queries directly on the database. This one just did, as a one-off thing and forgot about it. We spent a day + preparing to spent a full week on finding out what's wrong. My question is what happens if you can't find the cause and can't suggest what the potential cause might be.
– Nicholas Kyriakides
1 hour ago













"My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
– esoterik
29 mins ago




"My question is what happens if you can't find the cause and can't suggest what the potential cause might be" This is the exact reason the 'won't fix - can't duplicate' flag was invented.
– esoterik
29 mins ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f380579%2fhow-do-you-deal-with-bugs-that-can-never-be-reproduced%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery