How would you measure KPI of “IT Infrastructure” team?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.



For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.



However, for the latter I don't think I have similar metrics:



  • Project deliveries depend heavily on supplier lead time

  • Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.

  • In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above

So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?




To help giving you ideas, here are a sampling of what the Team has been doing:



  • Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)

  • Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)

  • Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)

  • Plan and arrange tender for UTM procurement

  • Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)






share|improve this question






















  • Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
    – JB King
    Jan 7 '15 at 6:53










  • @JBKing okay I'll provide some examples of what the team has been doing the past year.
    – pepoluan
    Jan 7 '15 at 6:55
















up vote
1
down vote

favorite












In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.



For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.



However, for the latter I don't think I have similar metrics:



  • Project deliveries depend heavily on supplier lead time

  • Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.

  • In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above

So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?




To help giving you ideas, here are a sampling of what the Team has been doing:



  • Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)

  • Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)

  • Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)

  • Plan and arrange tender for UTM procurement

  • Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)






share|improve this question






















  • Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
    – JB King
    Jan 7 '15 at 6:53










  • @JBKing okay I'll provide some examples of what the team has been doing the past year.
    – pepoluan
    Jan 7 '15 at 6:55












up vote
1
down vote

favorite









up vote
1
down vote

favorite











In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.



For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.



However, for the latter I don't think I have similar metrics:



  • Project deliveries depend heavily on supplier lead time

  • Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.

  • In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above

So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?




To help giving you ideas, here are a sampling of what the Team has been doing:



  • Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)

  • Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)

  • Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)

  • Plan and arrange tender for UTM procurement

  • Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)






share|improve this question














In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.



For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.



However, for the latter I don't think I have similar metrics:



  • Project deliveries depend heavily on supplier lead time

  • Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.

  • In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above

So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?




To help giving you ideas, here are a sampling of what the Team has been doing:



  • Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)

  • Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)

  • Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)

  • Plan and arrange tender for UTM procurement

  • Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)








share|improve this question













share|improve this question




share|improve this question








edited Jan 7 '15 at 7:05

























asked Jan 7 '15 at 6:42









pepoluan

106115




106115











  • Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
    – JB King
    Jan 7 '15 at 6:53










  • @JBKing okay I'll provide some examples of what the team has been doing the past year.
    – pepoluan
    Jan 7 '15 at 6:55
















  • Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
    – JB King
    Jan 7 '15 at 6:53










  • @JBKing okay I'll provide some examples of what the team has been doing the past year.
    – pepoluan
    Jan 7 '15 at 6:55















Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53




Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53












@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55




@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55










2 Answers
2






active

oldest

votes

















up vote
3
down vote













I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...



  1. Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)

  2. Time to resolution of issues.

  3. Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.

  4. Security (number of vulnerabilities found in scans by criticality)

  5. Number of production deployments

  6. Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).

  7. Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)

  8. Engineer satisfaction (no really, survey it, do a NPS)

  9. Many more...

First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.



Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.



However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.






share|improve this answer




















  • Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
    – pepoluan
    Jan 9 '15 at 10:28

















up vote
1
down vote













One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.



The SLA should contain at least the following:



  1. Service Roster (for the services that are provided by, or supported by the infrastructure team).


  2. Criticality, including RPO/RTO.


  3. Uptime requirements.


  4. Client requirements.


  5. DR/BC plan (in some industries, this is mandated).


  6. Overhead planning.


For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)



Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).






share|improve this answer




















  • Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
    – pepoluan
    Jan 9 '15 at 10:30










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "423"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworkplace.stackexchange.com%2fquestions%2f40047%2fhow-would-you-measure-kpi-of-it-infrastructure-team%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote













I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...



  1. Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)

  2. Time to resolution of issues.

  3. Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.

  4. Security (number of vulnerabilities found in scans by criticality)

  5. Number of production deployments

  6. Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).

  7. Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)

  8. Engineer satisfaction (no really, survey it, do a NPS)

  9. Many more...

First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.



Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.



However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.






share|improve this answer




















  • Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
    – pepoluan
    Jan 9 '15 at 10:28














up vote
3
down vote













I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...



  1. Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)

  2. Time to resolution of issues.

  3. Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.

  4. Security (number of vulnerabilities found in scans by criticality)

  5. Number of production deployments

  6. Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).

  7. Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)

  8. Engineer satisfaction (no really, survey it, do a NPS)

  9. Many more...

First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.



Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.



However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.






share|improve this answer




















  • Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
    – pepoluan
    Jan 9 '15 at 10:28












up vote
3
down vote










up vote
3
down vote









I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...



  1. Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)

  2. Time to resolution of issues.

  3. Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.

  4. Security (number of vulnerabilities found in scans by criticality)

  5. Number of production deployments

  6. Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).

  7. Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)

  8. Engineer satisfaction (no really, survey it, do a NPS)

  9. Many more...

First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.



Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.



However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.






share|improve this answer












I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...



  1. Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)

  2. Time to resolution of issues.

  3. Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.

  4. Security (number of vulnerabilities found in scans by criticality)

  5. Number of production deployments

  6. Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).

  7. Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)

  8. Engineer satisfaction (no really, survey it, do a NPS)

  9. Many more...

First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.



Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.



However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 7 '15 at 16:42









mxyzplk

7,16412234




7,16412234











  • Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
    – pepoluan
    Jan 9 '15 at 10:28
















  • Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
    – pepoluan
    Jan 9 '15 at 10:28















Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28




Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28












up vote
1
down vote













One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.



The SLA should contain at least the following:



  1. Service Roster (for the services that are provided by, or supported by the infrastructure team).


  2. Criticality, including RPO/RTO.


  3. Uptime requirements.


  4. Client requirements.


  5. DR/BC plan (in some industries, this is mandated).


  6. Overhead planning.


For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)



Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).






share|improve this answer




















  • Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
    – pepoluan
    Jan 9 '15 at 10:30














up vote
1
down vote













One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.



The SLA should contain at least the following:



  1. Service Roster (for the services that are provided by, or supported by the infrastructure team).


  2. Criticality, including RPO/RTO.


  3. Uptime requirements.


  4. Client requirements.


  5. DR/BC plan (in some industries, this is mandated).


  6. Overhead planning.


For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)



Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).






share|improve this answer




















  • Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
    – pepoluan
    Jan 9 '15 at 10:30












up vote
1
down vote










up vote
1
down vote









One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.



The SLA should contain at least the following:



  1. Service Roster (for the services that are provided by, or supported by the infrastructure team).


  2. Criticality, including RPO/RTO.


  3. Uptime requirements.


  4. Client requirements.


  5. DR/BC plan (in some industries, this is mandated).


  6. Overhead planning.


For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)



Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).






share|improve this answer












One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.



The SLA should contain at least the following:



  1. Service Roster (for the services that are provided by, or supported by the infrastructure team).


  2. Criticality, including RPO/RTO.


  3. Uptime requirements.


  4. Client requirements.


  5. DR/BC plan (in some industries, this is mandated).


  6. Overhead planning.


For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)



Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 8 '15 at 5:19









Burhan Khalid

3,64811423




3,64811423











  • Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
    – pepoluan
    Jan 9 '15 at 10:30
















  • Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
    – pepoluan
    Jan 9 '15 at 10:30















Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30




Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworkplace.stackexchange.com%2fquestions%2f40047%2fhow-would-you-measure-kpi-of-it-infrastructure-team%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery