Posts

Showing posts from January, 2010

Standard Operating Procedure (SOP) for IT Operations

Image
This is a template, available for download here, to use for Standard Operating Procedures (SOP) for IT Operations with examples.




Cactus fifteen fourty-nine (time lapse video)

Image
Would it be great if all IT major incidents could have a video time lapse? That would be an awesome tool! Below is the full story...

Grading the resouces involved in major incidents

Image
It is recommended that the resources that are involved in handing a major incident are graded as part of a continuous improvement program. This is a means of doing that grading. The maximum possible score is 32 and the grading is calculated by totaling up the scores from the eight different areas and representing it as a percentage of the maximum. The eight areas are: Identification and business impact – have the resources correctly identified the major incident and described in the correct level of detail what happened. Has the correct service impacted been identified from the service catalogue? Was the business impact obtained or measured?Conditions – what were the business, IT or environmental conditions present during the incident and did the resources describe these to a suitable level of detail.Expanded Incident Lifecycle – are all the times in the expanded incident lifecycle recorded and are they realistic. Were these recorded in the incident reference at the service desk.Resolu…

Incident User Metric (How big was it really?)

Image
The Incident User Metric (IUM) is a mechanism to measure incident in an objective manner and which will allow problem managers to classify these as either minor, normal or major. Most incidents that effect a significant amount of IT customers are potential major incidents. What constitutes a major incident and what does not? The key is in the IUM. After a large enough sample pool has been built (> 10 incidents) the average is calculated. Minor incident is an incident where the IUM is less than 40% of the norm. Major incident is an incident where the IUM is greater than 40% of the norm. Normal incident is an incident that is within 40% of the norm.

This metric is determined in the following manner: What is the opportunity cost to the company of 1 minutes outage based on the effect on productivity? (or put another way, what is the total salary bill of the company for 1 minute?)‏What was the length of the outage?What percentage of the IT customer population was impacted?Is it a lesse…

Risk management for IT (CRAMM Lite)

Image
Meerkats are one of the more risk aware animals. One or more meerkats stand sentry (lookout) while others are foraging or playing, to warn them of approaching dangers. When a predator is spotted, the meerkat performing as sentry gives a warning bark, and other members of the gang will run and hide in one of the many bolt holes they have spread across their territory. The sentry meerkat is the first to reappear from the burrow and search for predators, constantly barking to keep the others underground. If there is no threat, the sentry meerkat stops signaling and the others feel safe to emerge. Thus in the spirit of the Meerkat's I present CRAMM Lite.
CRAMM provides a staged and disciplined approach embracing both technical (e.g. IT hardware and software) and non-technical (e.g. physical and human) aspects of security. In order to assess these components, CRAMM is divided into three stages: Asset identification and valuationThreat and vulnerability assessmentCountermeasure selectio…

Service Outage Analysis

Image
An outage analysis is conducted of the service impacted. Two areas are assessed. Each area has a maximum score of 4 and service outage is the score of all areas represented as a percentage. Period - The measurement is based on elapsed time.Consequence - determined by financial means or business perceptionsMeasurement scale Service period classification (4) Critical - App, server, link (network or voice) unavailable for greater than 4 hours or degraded for greater than 1 day – negative business delivery for more than 1 month.(3) Major - App, server, link (network or voice) unavailable for greater than 1 hour or degraded for greater than 4 hours - negative business delivery for more than 1 week.(2) Moderate - App, server, link (network or voice) unavailable for greater than 30 minutes or degraded for greater than 1 hour - negative business delivery for more than 1 day.(1) Minor - App, server, link (network or voice) unavailable greater than 5 minutes or degraded for greater than 30 minute…

Business Impact Analysis

Image
The resultant impact on the company is measured to determine the perceptive.of the IT customer  Five areas are assessed. Each area has a maximum score of 4 and the classification is the score of all areas represented as a percentage. Scope - Percentage of customers affected.Credibility - Internal and external negative consequences in the company.Operations - Business interference.Urgency - Time planning.Prioritization - Resource reaction.Scope scoring (4) More than 50% of customers affected(3) More than 25% of customers affected(2) Less than 25% of customers affected*(1) Less than 1% of users affected(0) Single IT customer affected Credibility scoring (4) Areas outside the company will be affected negatively(3) Company affected negatively(2) Multiple business units affected negatively(1) Single business units affected negatively(0) No credibility issue* Operations scoring (4) Interferes with core business functions(3) Interferes with business activities*(2) Significant interference with c…

South African robots

Image
In South Africa we call traffic lights, robots. Usually, there are beggars...

...and this was doubly strange as the car in front of me had a numberplate of madam!