How would you measure KPI of “IT Infrastructure†team?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.
For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.
However, for the latter I don't think I have similar metrics:
- Project deliveries depend heavily on supplier lead time
- Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.
- In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above
So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?
To help giving you ideas, here are a sampling of what the Team has been doing:
- Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)
- Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)
- Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)
- Plan and arrange tender for UTM procurement
- Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)
management performance-reviews
suggest improvements |Â
up vote
1
down vote
favorite
In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.
For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.
However, for the latter I don't think I have similar metrics:
- Project deliveries depend heavily on supplier lead time
- Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.
- In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above
So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?
To help giving you ideas, here are a sampling of what the Team has been doing:
- Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)
- Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)
- Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)
- Plan and arrange tender for UTM procurement
- Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)
management performance-reviews
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55
suggest improvements |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.
For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.
However, for the latter I don't think I have similar metrics:
- Project deliveries depend heavily on supplier lead time
- Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.
- In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above
So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?
To help giving you ideas, here are a sampling of what the Team has been doing:
- Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)
- Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)
- Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)
- Plan and arrange tender for UTM procurement
- Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)
management performance-reviews
In my current employment, I oversee two teams of which I'm responsible for measuring their performance, "IT Help Desk" team and "IT Infrastructure" team.
For the former, the KPI is relatively easy to measure: Average time between ticket creation & team response, time elapsed between escalation stages, and satisfaction grading from users.
However, for the latter I don't think I have similar metrics:
- Project deliveries depend heavily on supplier lead time
- Some problems need solution that require (1) research, (2) coordination with the Application team, or (3) both.
- In addition to 'standard' troubleshooting, many output/deliverables from "IT Infra" team are in the form of concepts & planning. The execution/finalization of these are strongly related with point #2 above
So, if I am to create a measurable KPI for the "IT Infrastructure Team", what possible KPI measurements would you suggest?
To help giving you ideas, here are a sampling of what the Team has been doing:
- Doing troubleshooting of network outages, which 99% of the time was due to problems with 3rd party (ISPs)
- Doing troubleshooting of Office365 Hybrid problems (a majority of our users still use on-premise Exchange)
- Perform server upgrades/maintenance (e.g., add RAM) to staging & production servers (in accordance to downtime schedule agreed with Application team)
- Plan and arrange tender for UTM procurement
- Plan and carry out P2V migration (actual migration date/time depends on schedule as agreed with Application team)
management performance-reviews
edited Jan 7 '15 at 7:05
asked Jan 7 '15 at 6:42


pepoluan
106115
106115
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55
suggest improvements |Â
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55
suggest improvements |Â
2 Answers
2
active
oldest
votes
up vote
3
down vote
I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...
- Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)
- Time to resolution of issues.
- Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.
- Security (number of vulnerabilities found in scans by criticality)
- Number of production deployments
- Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).
- Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)
- Engineer satisfaction (no really, survey it, do a NPS)
- Many more...
First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.
Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.
However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
suggest improvements |Â
up vote
1
down vote
One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.
The SLA should contain at least the following:
Service Roster (for the services that are provided by, or supported by the infrastructure team).
Criticality, including RPO/RTO.
Uptime requirements.
Client requirements.
DR/BC plan (in some industries, this is mandated).
Overhead planning.
For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)
Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
suggest improvements |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...
- Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)
- Time to resolution of issues.
- Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.
- Security (number of vulnerabilities found in scans by criticality)
- Number of production deployments
- Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).
- Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)
- Engineer satisfaction (no really, survey it, do a NPS)
- Many more...
First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.
Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.
However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
suggest improvements |Â
up vote
3
down vote
I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...
- Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)
- Time to resolution of issues.
- Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.
- Security (number of vulnerabilities found in scans by criticality)
- Number of production deployments
- Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).
- Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)
- Engineer satisfaction (no really, survey it, do a NPS)
- Many more...
First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.
Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.
However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
suggest improvements |Â
up vote
3
down vote
up vote
3
down vote
I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...
- Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)
- Time to resolution of issues.
- Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.
- Security (number of vulnerabilities found in scans by criticality)
- Number of production deployments
- Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).
- Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)
- Engineer satisfaction (no really, survey it, do a NPS)
- Many more...
First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.
Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.
However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.
I've managed a variety of infrastructure and automation teams and have used these kinds of KPIs. When picking KPIs, remember the point. The organization wants to know a) what value you're providing to the company, b) what areas need more or less investment/attention, and c) whether you're improving or not. Different organizations have different values, and also different amounts of tolerance for more fine grained information (Do they want "Security: green/yellow/red" or do they want to know the write IO of all the prod databases? I've been in both kinds of orgs). Your best first step is to ask your customer groups "what, exactly, do you care about us providing to you?" and craft KPIs to fit. A laundry list of some possible ones are...
- Uptime of the services they are responsible for, both including and excluding maintenance windows. Also, number and severity of production incidents, potentially broken down per area (network, email, servers...)
- Time to resolution of issues.
- Velocity (overall throughput of work). Helps if you measure this, I run all my infrastructure teams on Scrum now because I've found it helps me measure and improve this.
- Security (number of vulnerabilities found in scans by criticality)
- Number of production deployments
- Number of users/hits/workloads processed (the closer to $$ the better). Pretend you're an actual IT services company showing your KPIs - wouldn't you use how many customers you have or how many hits your SaaS app gets or whatever as a KPI? (In fact, "think of yourself as a small company" is a good mindset to get into when determining what KPIs you'd report that others would care about).
- Capital and expense cost and changes (cost reduction is popular, but usually needs to be shown in context with #6 because cost naturally grows with usage)
- Engineer satisfaction (no really, survey it, do a NPS)
- Many more...
First let's address the concern of "But but... It depends on suppliers and devs and other people" issue. You also need to work with your team (and yourself?) to break them of the "it's not completely under our control" excuse. Because that's all it is, an excuse. Take as an example a sales rep with a sales goal. Is it all up to her whether she hits her sales goal? By no means! She's dependent on those potential customers, but also on the features and quality of the product, on support from SEs and other groups. But her goal is still "$ sold" because that's the chief goal of Sales. They can't do it alone, but the point of a KPI isn't "what you can do alone" it's "what you are responsible for getting done, regardless of the web of other stakeholders involved." Please don't go to your boss with that line of reasoning because you will just get a dressing down for your trouble.
Sometimes IT people don't understand KPIs. The goal isn't to have them at 100% or be perfect. The goal is to have, first, a measure and a goal on that measure, and then to be able to continually improve to raise the goal. It's also not a "blame" tool. If you find some newly minted security issues (e.g. heartbleed) in a scan, is that your "fault?" Is the system less secure than it was yesterday? No, but that's not relevant. It shows that there's suddenly some work done and that the best known state of your security is now worse than was thought yesterday.
However, having said that, one of the best things you can do to mitigate some of these concerns is to implement some DevOps concepts, where the core responsibilies are a shared KPI between the development and infrastructure organizations. I did this at one company, where I and the director of development both went to our business owner (Web marketing VP) and we worked out common KPIs that were important to the business owner. They ended up being Performance, Uptime, Velocity, Cost, %Effort Spent On Maintenance vs New Development, if I recall correctly. We were then both responsible for those. Nothing like shared responsibility to stop blame shifting. He couldn't treat production outages as "an Infrastructure problem," he had to get his devs more proactive about not making dumb design decisions. I couldn't treat new feature development as "a development problem," we innovated around how we could help the devs create, test, and release their code more quickly and with less friction.
answered Jan 7 '15 at 16:42
mxyzplk
7,16412234
7,16412234
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
suggest improvements |Â
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
Thanks for the insights! I'm still not sure if DevOps and/or Scrum are suitable for my team, but you did raise important points for me to contemplate over this weekend.
– pepoluan
Jan 9 '15 at 10:28
suggest improvements |Â
up vote
1
down vote
One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.
The SLA should contain at least the following:
Service Roster (for the services that are provided by, or supported by the infrastructure team).
Criticality, including RPO/RTO.
Uptime requirements.
Client requirements.
DR/BC plan (in some industries, this is mandated).
Overhead planning.
For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)
Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
suggest improvements |Â
up vote
1
down vote
One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.
The SLA should contain at least the following:
Service Roster (for the services that are provided by, or supported by the infrastructure team).
Criticality, including RPO/RTO.
Uptime requirements.
Client requirements.
DR/BC plan (in some industries, this is mandated).
Overhead planning.
For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)
Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
suggest improvements |Â
up vote
1
down vote
up vote
1
down vote
One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.
The SLA should contain at least the following:
Service Roster (for the services that are provided by, or supported by the infrastructure team).
Criticality, including RPO/RTO.
Uptime requirements.
Client requirements.
DR/BC plan (in some industries, this is mandated).
Overhead planning.
For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)
Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).
One approach that I tried before is to implement a SLA (Service Level Agreement) for the IT infrastructure team, and then judge them on the compliance to (or deviation from) that SLA.
The SLA should contain at least the following:
Service Roster (for the services that are provided by, or supported by the infrastructure team).
Criticality, including RPO/RTO.
Uptime requirements.
Client requirements.
DR/BC plan (in some industries, this is mandated).
Overhead planning.
For monitoring the SLA you can use your existing system reporting applications (for uptime, throughput, etc.)
Doing a similar exercise at a previous client ended up with a lot of insight into the infrastructure team (they appreciated this because it made others realize how critical they are); and it gave the business a good plan going forward on where to dedicate resources for business requirements (as they had a service map and a criticality plan).
answered Jan 8 '15 at 5:19
Burhan Khalid
3,64811423
3,64811423
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
suggest improvements |Â
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
Ah, very helpful! Totally forgotten RPO/RTO there... I had been thinking on how to measure 'quality' of some of the Infra works, and a quick search led me to a treasure trove of ideas. The other points are also nice. I'll certainly have an interesting weekend to ponder all the insights. Thanks!
– pepoluan
Jan 9 '15 at 10:30
suggest improvements |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworkplace.stackexchange.com%2fquestions%2f40047%2fhow-would-you-measure-kpi-of-it-infrastructure-team%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Could you give some examples of the concepts and planning please? I'd wonder if there are ways to measure capacity, system uptime and other factors that may be measurable metrics.
– JB King
Jan 7 '15 at 6:53
@JBKing okay I'll provide some examples of what the team has been doing the past year.
– pepoluan
Jan 7 '15 at 6:55