Developing a Troubleshooting Strategy (Cisco Wireless LAN Controllers)

When you think about a wireless network, especially one involving Lightweight Access Point Protocol (LWAPP) or Control and Provisioning of Wireless Access Points (CAPWAP), the topology can be profoundly large. The challenge of troubleshooting a wireless issue can be intimidating to any seasoned engineer. The issue might not even be wireless, but ultimately it can affect all wireless connectivity or the quality of the connection. The question is a simple one, but at this point, it might be the most difficult: Where do I start or how do I begin?

Developing a troubleshooting strategy can be a life saver. Usually strategies work well on issues that have been around for awhile or that are intermittent. Depending on the issue, your strategy might change to best suit what is currently going on. No matter which way you look at it, the best choice is to have a plan ready to go. You can always modify your strategy if the parameters of the problem change while you’re troubleshooting. It’s easier to be in a situation in which your strategy needs extensive modification than to be without one.

Production Versus Nonproduction Outages

A network problem typically falls into one of the following two types of categories, either of which can fit into a production or nonproduction outage:

■ Outage renders the network completely useless or inoperable: Believe it or not, this does provide some positive aspects to troubleshooting. Network activity that would usually require a maintenance or change window can now be accomplished at any time because the network is down. A network-down scenario is usually easier to identify and fix because the issue is constant.


■ Outage renders the network partially impaired: Issues that fall into this category are usually smaller in magnitude, but not always. For example, your wireless laptop users might be able to access all network resources with the exception of the printers. Another example would be if your 7921 voice users have degraded voice quality. Users can still receive and place calls, but it might be difficult to understand the other party.

Step 1: Gathering Data About the Problem

No matter what issue you encounter, the one resource that helps any situation is information about the issue and knowledge of the environment. Information aids in your understanding of what you are potentially dealing with—the scope, magnitude, and other facets that could be influencing the issue at hand. No matter what problem you start to troubleshoot, information gathering should always be the first step. In most cases you do not even realize you have done that.

Step 2: Identifying the Problem

Identifying and isolating the problem can be a major headache in itself, especially in a centralized wireless network using LWAPP and CAPWAP.

Wired networks alone can encompass quite a few network resources. Figure 1-1 shows an example of what you might see in a typical wireless network setup.

If you add the components of a wireless network to a wired network, you have a rather large plethora of network resources:

■ Multiple LANs

■ Large LANs

■ Multiple VLANS (Inter-VLAN routing)

■ WANs

■ Routing protocols

■ Multicast

■ Hot Standby Routing Protocol (HSRP)

■ Ether Channel

This list is just a small example of the wireless network resources and issues you need to investigate on top of the existing wired devices. Do not forget that this is a wireless deployment and that you also have to look at the wireless pieces:

■ Interference

■ Access points (APs)

■ Controllers

■ Antennas

■ Authentication equipment (RADIUS servers, APs, or Wireless LAN Controllers [WLC], and so on)

■ Client-related problems

Resources in a Typical Network

Figure 1-1 Resources in a Typical Network

Step 3: Isolating the Problem

A key piece of troubleshooting is to potentially identify the source of the issue. A networking topology can be a valuable tool in assisting you to do so. Judging from all the items listed previously, you have a lot of work cut out for yourself. You should always keep in mind that, while narrowing the list of possible culprits, you should never permanently rule out anything. At some point you might have to revisit the same resource that you looked at initially. Anyone who has been involved with troubleshooting networking-related issues for some time has been a part of a problem that was misdiagnosed or at some point had to claim responsibility for an incorrect action or identification of the problem.

A valuable piece of advice to remember is to always look at the big picture when searching for the root cause of the problem. Never let the symptoms of the problem mislead you.

Network Topology

A network topology can be a great visual roadmap of all the routes and equipment that are used. A network topology can isolate the issue even further and once again inform you of what pieces are or are not involved.

One of the most important steps is to develop a network diagram of the current network on which you are troubleshooting the issue. This can really put the network and its components into perspective. To build your network topology, use network diagram drawing software such as Microsoft Visio, SmartDraw, or similar tools. After the foundation is built, you can update it when needed. This can prove to be useful, especially if you have to contact a third-party support vendor. Your network topology is at your disposal and benefits others. Ideally, when troubleshooting, this drawing is already present or is included in any service requests.

What does the network diagram need to contain? The answer to this question can vary depending on the network size and type. This assists in tracking and being able to quickly connect to any device in the network. What is going to be useful in helping you solve the issue? Consider the following commonly used items:

■ Device type diagrams (routers, switches, and so on)

■ Model numbers

■ IP addresses

■ Subnets, VLANs, and so on

■ Routing areas

■ Protocols (Frame Relay, ATM, and so on)

■ Interfaces, port numbers, and so on

■ Software version

■ Passwords

In addition, for the wireless portion of the network, you might need the following to generate a comprehensive topology:

■ Mobility groups

■ Radio frequency (RF) groups

■ Radiation patterns of APs

■ Access point channel information

■ Access point power information

■ Physical barriers or RF barriers

■ AP group VLANs (if applicable)

Note AP group VLANs, along with WLAN override, have replaced the AP group functionality in version 5.2.

You can also generate this information by using a Wireless Control System (WCS) if you have one. The WCS and the Wireless Location Appliance, as seen in Figure 1-2, can be useful in many ways. The Cisco 3300 Series Mobility Services Engine is a combination of hardware and software. The Mobility Services Engine is an appliance-based solution that supports a suite of software services to provide centralized and scalable service delivery. The Mobility Services Engine transforms the wireless LAN into a mobility network by abstracting the application layer from the network layer, effectively allowing for the delivery of mobile applications across different types of networks.

Cisco Wireless Control System and Wireless Location Appliance

Figure 1-2 Cisco Wireless Control System and Wireless Location Appliance

Note The 2700 (wireless location appliance) has been deprecated and is being replaced by the 3300 Series Mobility Services Engine.

The WCS contains useful information and can be quite helpful.

However, because of the real-time necessity of information gathering, WCS can be sub-optimal at times when troubleshooting. WCS takes snapshots at configured intervals to update its database. If any changes are made, the administrator has to wait until the next update interval or manually submit an update to see the change. WCS is not needed for a wireless network. WCS is a management standalone database that operates on a server. It acts as a third-party device and is passive unless used otherwise for configuration changes and so on. Figure 1-3 demonstrates how WCS is integrated into networks.

Cisco Wireless Control System Integrated into a Network

Figure 1-3 Cisco Wireless Control System Integrated into a Network

Depending on the size of the network, you might have multiple topology pages and maps. Always remember that there is nothing wrong with this—having too much information is not a bad position to be in. Obviously, everything listed is not required or set in stone; items are listed to give you a good starting point or items additional options to consider. You should always get as much information as needed to troubleshoot your issue.

Gathering General Information

Information is valuable in any form or fashion and is always vital. The best way to determine what information you might need for your network issue is to imagine that you are talking to someone over the phone. That is usually the most challenging environment because you are not physically there. Imagine what questions you would ask to educate yourself so you could provide the next course of action(s) or help solve the problem. This list can give you an idea of the potential information that is going to be needed. If you are the network administrator/owner, you must obtain the following information:

■ Details about what the user actually experienced or is currently experiencing

■ Information about the scope of the issue and how many users are affected

■ Frequency of the issue

■ Configurations of devices

■ A network topology

■ Any error messages, message logs, or sys log information

■ Debug requirements

■ MAC addresses/IP addresses for debugs or any other utility/application that might need them

■ Any additional information/resources for the next troubleshooting steps

This is a good list to get you started. By no means is this list set in stone; you should modify it to fit the issue. If you have to contact a third party for support, it is beneficial to have this information, and in many cases, this information can decrease network outage time. It all comes down to what works for you.

You will encounter network issues that you simply will not have sufficient or the right kind of information to even begin troubleshooting. In many cases, you will need multiple tools set up or in place so when the problem happens again you can collect all the necessary elements. The key element is that in many network issues, additional work will be needed to gain the informational components to proceed to the next step in troubleshooting. This step might be acquiring additional informational resources or corrective action of the issue.

Frequency of the Issue

When discussing time with regard to a problem, you must consider a few factors. Time can be a valuable asset when trying to troubleshoot an issue. The frequency of the problem is important if the entire network is not down. Some issues that you can run into might occur only once a month. This can help set expectations on what information to acquire during the time the issue exists. The problem duration is also valuable because you know what can and cannot be done during this time frame.

In summary, you need to answer four questions in the most accurate and efficient manner:

■ How long has the problem been going on?

■ When did it start?

■ How often does it occur?

■ When the problem occurs, how long does it last?

The answers to these questions provide valuable information for the troubleshooting process. They also direct action for the next step you need to take in solving the problem. A subsequent question might be this: Were there network changes before or at the time the problem started? You open the door for numerous other questions while educating yourself, taking one step closer to the problem solution.

Step 4: Analyzing the Data Collected About the Problem

Now that you have collected data from various sources, you must analyze it to find the root cause or workaround for your problem. In many scenarios, you will find that your support vendor will ask or obtain this information to aid in efforts to troubleshoot. If part of your plan is to engage your support vendor, it is a good idea to have already gathered this information. This saves you quite a bit of time in the long run. In addition, it decreases the overall time to locate and resolve the issue you are having. For any piece of hardware, get to know your supporting vendor and what this person might or might not ask.

Tip Get to know your vendor and what this person might ask to help solve your issue. Having this material ahead of time reduces troubleshooting and resolution time.

Another good idea is to get experience and knowledge of the common troubleshooting tools that you might use to aid in problem resolution. An example of this is using sniffer tools to read packet captures or the debugging system of the WLC.

Narrow the List of Possible Causes

After you analyze the collected information data from monitoring tools, logs, and so on, you are in a position to logically narrow the list of possible causes of your problem. It is usually a good idea to start large and then work your way down to something more manageable. When problem identification is at a point that you can reasonably apply additional test methods, you can thoroughly investigate that particular cause and really put it to the test. In many cases, it is as easy as using common sense to reduce the list by 50 percent to 75 percent.

Determining the Proper Troubleshooting Tool

A plethora of troubleshooting tools is available. Most products sold on the market usually contain their own troubleshooting tools, debugs, or some form of diagnostic system. The large number of troubleshooting tools can make it extremely difficult to select which ones are best suited for the job. This topic lays out the best tools, debugs, and troubleshooting tips to help you solve most issues that may arise. That way you are better prepared for whatever problem might surface—expected or unexpected.

Summary

Most network issues are reported with a generic description. For example, "All users on the wireless network are experiencing slow response to an application." You must be logical when reporting and troubleshooting the problem. It will be difficult to troubleshoot every user if someone reports that all users are experiencing latency. In many cases, there will be a working model and a nonworking model. A few examples would be a problem on a particular switch. If you had multiple switches in your network, you could compare the working switch to the switch that had the issue. The nice approach to this model is that even if you do not have any idea what is occurring, you can always take a packet capture of the working and nonworking switch and compare packet to packet. In another example, you could look at a problem with a client PC. You would start by listing the difference between the working and nonworking machine.

Tip When comparing equipment, try to find pieces that are close or identical.

You want to try to find machines that are inherently close to each other. The differences between each piece of equipment could invalidate your research and results.

After you have the list of differences between a working and nonworking PC, examine each difference by itself. You do this by removing the differences one at a time. If you remove more than one, you run the risk of solving the problem, without knowing which difference was the cause. One major flaw in the strategy is that you do not always have an accurate picture of the correctly running machine.

Troubleshooting methodology is critical when any network problem arises. You need to have the quickest and most efficient method in your head and at your fingertips. The difference could cost you resources and considerable time.

Next post:

Previous post: