About Me

My photo
Experienced Information Technology leader, author, system administrator, and systems architect.

Saturday, April 13, 2013

Fighting Fires and Wasting Time

When your staff spends time fighting fires, they have less time to improve the environment. This Teamquest study illustrates the problem.

According to this study, each IT professional encounters about 20 "unexpected" issues per week. Each issue takes an average of 5 team members to address it, and it takes an average of one hour from each of these people to resolve the issue.

This is a staggering amount of time, when you add it up. In a typical environment, 30% of staff time is already spent on maintenance and mundane tasks. Only 8% is dedicated to proactive efforts, such as capacity planning, data management, and problem prevention.

What actions are you taking in your environment to address problems before they occur? Try to identify and characterize the most common types of fires your team has to fight. Rank them based on how much time they waste, and look at what would be required to prevent them.

Once you have cost/benefit information in hand, devote resources to preventing problems before they occur. Go after the lowest-hanging fruit first, then use the resources you free up to go after the others. Your team will appreciate the stress reduction, as well as the time they can now devote to more interesting and meaningful projects.

Friday, April 12, 2013

What is it you want, anyway?

Clarity is the key to getting what you want. I don't just mean clarity in asking someone else for what you want. You also need to be clear with yourself.

What do you want? What do you need? What is your bottom line?

Now, think about the person you are asking for help. If you can provide them something of value, you are part of the way to setting up a relationship of reciprocity with them.

In IT, a lot of the things that we need are things that we will need again tomorrow, and the day after that. It is in your best interest to find a way to build a reciprocal relationship of trust with the other person.

Thursday, April 11, 2013

Troubleshooting Intermittent Problems

Intermittent problems are extremely difficult to troubleshoot. Any reproducible problem can be troubleshot, if for no other reason than that each individual component can be proven to not be the problem through experimentation. Problems that are not reproducible cannot be approached in the same way.

Problems present as intermittent for one of two reasons:

  1. We have not identified the real cause of the problem.
  2. The problem is being caused by failing or flaky hardware.

The first possibility should be addressed by going back to brainstorming hypotheses.

It may be helpful to bring a fresh perspective into the brainstorming session, either by bringing in different people, or by sleeping on the problem.

The second problem is tougher. There are hardware diagnostics tests that can be run to try to identify the failing piece of hardware.

The first thing to do is to perform general maintenance on the system. Re-seat memory chips, processors, expansion boards and hard drives.

Once general maintenance has been performed, test suites like SunVTS can perform stress-testing on a system to try to trigger the failure and identify the failing part.

It may be the case, however, that the costs associated with this level of troubleshooting are prohibitive. In this case, we may want to attempt to shotgun the problem.

Shotgunning is the practice of replacing potentially failing parts without having identified them as actually being flaky. In general, parts are replaced by price point, with the cheapest parts being replaced first.

Though we are likely to inadvertently replace working parts, the cost of the replacement may be cheaper than the costs of the alternatives (like the downtime cost associated with stress testing).

When parts are removed during shotgunning, it is important to discard them rather than keep them as spares. Any part you remove as part of a troubleshooting exercise is questionable. (After all, what if a power surge caused multiple parts to fail? Or what if there was a cascading failure?) It does not make sense to have questionable parts in inventory; such parts would be useless for troubleshooting, and putting questionable parts into service just generates additional downtime down the road.

This practice may violate your service contract if performed without the knowledge and consent of your service provider.

Regardless of the method used to deal with intermittent problems, it is essential to keep good records. Relationships between our problem and other events may only become clear when we look at patterns over time. We may only be confident that we have really resolved the problem if we can demonstrate that we've gone well beyond the usual re-occurrence frequency without the problem re-emerging.

Book Review: Time Management for System Administrators

Tom's book is a quick read, which is definitely what you want from a book about time management. He starts with hints about how to manage time in an operational setting. His suggestions are down to earth and eminently implementable.

His title states that the book is for SAs, but the tips can be helpful to anyone who works in a high-interruption, high-stress environment.

Tuesday, April 9, 2013

Resolving Conflicts

Resolving conflicts within a team is probably one of the least favorite tasks for any team manager. Some conflicts should be allowed to work themselves out, but other conflicts can consume the team if they are left unchecked.

Not all conflict is bad. Constructive conflicts can take place in an atmosphere of mutual respect, where the parties involved express opinions and work towards a resolution that is better than what any one of them would have proposed alone. Destructive conflicts, however, need to be watched and managed.

There are several different possible ways to approach a conflict. There are times when each of these is appropriate, but these are listed from most effective to least effective.

  • Confrontation. The people involved directly confront the problem and try to work it through.
  • Compromise. The parties use give-and-take to try to give everyone part of what they want.
  • Smoothing. Areas of agreement are emphasized over areas of disagreement.
  • Forcing. A “solution” is imposed from higher up the hierarchy. When this method is overused, a manager will probably be seen as autocratic.
  • Withdrawal. Ignoring the problem is usually the least effective way to deal with a problem. Most problems get worse when ignored, not better.

There are times when each of these approaches is valid, but the strongest managers will try to directly confront problems when possible and reach compromise solutions when necessary. Forcing may be needed from time to time, but should be used sparingly.

Book Review: Agile Management

Medinilla has captured the essence of the new management paradigm. A lot of the old command-and-control sorts of tactics that we inherited from the industrial revolution simply do not work in the information age.

He does not get bogged down in the details fo different Agile methodologies, but espouses bringing the soul of the Agile Manifesto to the organization as a whole.

Small groups within companies have implemented Agile and benefited from it, but many Agile proponents are allergic to management. Medinilla's book contains good advice and information for organizations that are looking to take advantage of Agile on a larger scale.

Monday, April 8, 2013

Book Review: The Practice of System and Network Administration

System Administration as a Profession

"The Practice of System and Network Administration" is different from most of the other technical books on a professional SA's bookshelf. This book is about how to become a professional system administrator.

The profession is about more than knowning obscure options to different commands. To become a professional, a system administrator needs to change mindset from a straight-ahead techie to a member of the team who has specialized expertise.

System administration has not always been recognized as a profession. System administrators themselves are partly to blame for that. We have tended to focus strictly on technology and not on how to structure our work to benefit both ourselves and the organizations we work for. Limoncelli, Hogan and Chalup have put togeter a great standard reference for people who are ready to transition to being professional system administrators.

Sunday, April 7, 2013

Book Review: Managing Humans

In his first chapter, Lopp tells us what kind of book he is looking to create. He describes a tavern where colleagues get together to resolve the problems of the universe. The tone is irreverent, engaging, and informative.

Lopp provides pointers about how to handle different situations, but his focus is on communicating attitudes and values more than on providing direct advice.

Some types of lessons are best learned by sharing stories. As in his popular blog, Lopp's anecdotes provide us with a career's worth of insight into what it takes to connect with and manage people.