Troubleshooting Roaming Faults Part 1

Roaming is a complex service. The home network, intermediate international carriers and the visited network must all function perfectly under varying conditions, including network load and number and location of roamers, to establish a roaming call. Even though the wireless service providers perform comprehensive tests before service launch to ensure that all the features available to roamers work perfectly, the stability of service is constantly put to the test by dynamic changes in the involved networks. The new software releases, patches, bug fixes, reconfigurations, and routing changes all are prone to errors and may cause roaming service breakdown.

Common network problems

Some of the common problems encountered are as follows. The list is by no means an exhaustive one.

■ Routing table errors

■ E.212 to E.214 translation errors

■ Signaling link outage at local end

■ Signaling link outage at remote end

■ Remote PLMN HLR outage

■ MSRN shortage at MSC

■ GT not updated

■ APN configuration

■ Timing issues, timer expires

■ SCCP routing problem

■ ISUP call routing problem

■ Subscriber data in the HLR not correct

■ Radio coverage

■ Mobile station configuration

Information gathering on symptoms

The very first step to resolve any roaming problem is to gather information on symptoms. It is likely that one may find some pattern to make localizing a fault much easier. Some of the questions one may like to find answers to are as follows.

General

Who is impacted?

■ Outbound roamers

■ Inbound roamers

What is the problem statement?

■ Unable to connect to the network.

■ It takes a long time to connect to the network.

■ Unable to receive incoming calls.

■ Unable to establish outgoing calls to local subscribers in the visited network.

■ Unable to establish international outgoing calls.

■ Unable to send or receive SMS/MMS.

■ Unable to access the Internet in the visited network.

■ Unable to access WAP.

Is the problem isolated to one roamer or does it affect a group of roamers?

Is the problem isolated to one specific partner or does it affect a group of roaming partners?

Do the symptoms occur regularly or intermittently? Is there any pattern, e.g., time of day?

Is the symptom repeatable?

Own network

Was there any network reconfiguration in the recent past?

Is there any network migration or expansion happening now?

Is the symptom related to certain network elements, e.g., a subscriber register in one particular HLR, roamers currently visiting a certain geographical area covered by specific MSC/VLR, or SMS submission to a specific SMSC?

Do symptoms point to a certain APN?

Foreign network

Is there any new IR.21 information exchange?

Is my network updated based on latest IR.21 information from the partner networks?

Are the symptoms related to a specific or a group or to all VPLMNs?

Diagnostic tools

Signaling protocol analysis. Signaling carries rich information on services, network, and subscribers. Monitoring signaling links carrying international roaming traffic often gives valuable clues to localize, diagnose, and possibly resolve faults. One can use stand-alone protocol analyzers, distributed protocol analyzers, or link-monitoring solutions. The link-monitoring solutions offer a great advantage over stand-alone analyzers. A few of the advantages of link-monitoring solutions are as follows.

■ Centralized monitoring

■ Ability to monitor a few links to several hundred or even a thousand links

■ Correlates call legs across the links, e.g., ISUP and MAP correlation

Monitoring points. The objective of monitoring is to have complete visibility of international SCCP traffic to partner networks. The decision on monitoring points depends on the network configuration. For example, if a wireless operator is also operating an international gateway with SCCP routing support, the links connecting to international carrier/hubs or links directly to partner networks are monitored (Figure 10-12, network boundary A).

Figure 10-12 Monitoring points.

In the other case, where the wireless operator does not have SCCP routing capabilities, the interconnect links from GSMC to the international SCCP gateway are monitored (Figure 10-12, network boundary B).

The links are tapped by using nonintrusive probes. The bridging isolation techniques are used to ensure there will be no impact on the network in case of monitoring equipment failure.

The Gp links from SGSNs to partner networks are also monitored to capture critical information on sessions initiated by inbound roamers. Usually network operators connect to partner networks using services from a global roaming exchange (GRX). In this case the Gp links from SGSN to GRX are monitored.

IREG testers. IREG testers are the most commonly used equipment for testing and verifying inbound roaming. This equipment is loaded with SIMs from partner networks to simulate inbound roamers. The testing is done from the end-user point of view. The diagnostics information available is very limited.

For outbound roaming testing, a tester requests its counterpart in the partner network to perform the tests and send back the results. The wireless service providers are obliged to perform the tests periodically and on request under the GSM MoU agreement.

Understanding protocol errors

The HPLMN and the VPLMN entities, e.g., HLR and VLR, communicate to each other by using MAP protocol. Understanding of MAP operations, associated errors, and diagnostic information provides greater insight and help in diagnosing, isolating, and resolving roaming faults.

The TCAP provides non-circuit-related information transfer capabilities for a variety of the applications, such as MAP. In the applications in the VPLMN VLR, MSCs use MAP services, which in turn use TCAP to invoke MAP procedures at the HPLMN HLR and other entities. MAP is specifically designed for mobile networks. TCAP relies on the SCCP to deliver signaling messages to other entities across the network. TCAP uses only connectionless services of the SCCP. This means that the SCCP-UDT messages are used only to transport TCAP messages.

Figure 10-13 shows the protocol stack for the communication between the VPLMN and the HPLMN and vice versa. Each of the protocol layers handles errors from its users or providers and takes appropriate action as defined in SCCP, TCAP, and MAP specifications. The following sections describe this aspect in more detail.

ATCAP message consists of two parts, i.e., a transaction sublayer and a component sublayer. The transaction sublayer is responsible for managing the exchange of messages containing components between two TCAP entities. The component sublayer is responsible for component handling between originating and terminating TC users, i.e., HLR, VLR, etc. Figure 10-14 shows transaction and component sublayers and associated message types.

Transaction layer error handling. The transaction sublayer uses the abort message to terminate a transaction. The transaction is aborted following an abnormal condition detected by the transaction sublayer or because of a request by the component sublayer.

Figure 10-13 Protocol stack.

Figure 10-14 Transaction and component sublayer.

A reason for terminating the transaction may or may not be given. Two distinct cause codes are provided to identify the source for the termination.

P-abort cause codes are used when the termination request is initiated by the service provider, i.e., TCAP in this case.

U-abort cause codes are used when the user initiates the termination request, i.e., MAP in this case.

Figure 10-15 shows a VLR sending an abort request to an HLR with P-Abort cause code. The transaction is identified by a transaction ID 3b00e8. The TCAP in the MSC/VLR initiated this in response to a continue message received previously with an unrecognized transaction. Table 10-6 shows the provider and user abort causes.

Component portion error handling

Reject component. If an application is not able to process a component, a reject component is used to convey the nature of the problem to the requester. Table 10-7 lists problem categories and associated problem codes.

Figure 10-15 TCAP abort with P-abort cause code.

Return error component. A return error is sent back if an operation as requested in the invoke component cannot be completed. A return error does not necessarily mean a protocol error. It also includes other causes that prevent an operation from being completed. For example, if a subscriber does not subscribe to roaming services but tries to register in a foreign network, the HPLMN HLR will return an error indicating roaming not allowed. Figure 10-16 shows a subscriber from one of the Singapore networks trying to roam in the Malaysian network. However, VPLMN did not allow this subscriber to roam, as the HPLMN HLR returned an error with error code roaming not allowed (see Figure 10-17).

TABLE 10-6 Abort and Cause Code

Transaction portion message type	Operation code (hex)	Cause
Abort	67	P-abort
		■ Unrecognized message type
		■ Unrecognized transaction ID/type
		■ Badly formatted transaction portion
		■ Incorrect transaction portion
		■ Resource limitation
		U-abort
		The reason is sent within the dialog portion. A common abort cause is application context not supported. This indicates incompatibility in supported protocol version between sender and receiving entity.

TABLE 10-7 Reject Component Problem Codes

Problem code tag	Problem codes	Remarks
General problem	Unrecognized component Mistyped component Badly structured component
Invoke problem	Duplicate invoke ID Unrecognized operation Mistyped parameter Resource limitation Initiating release Unrecognized linked ID Linked response unexpected Unexpected linked operation	Service not supported Abnormal event detected by peer Response rejected by peer Response rejected by peer
Return result problem	Unrecognized invoke ID Return result unexpected Mistyped parameter	Response rejected by peer Unexpected response from the peer Response rejected by peer
Return error problem	Unrecognized invoke ID Return error unexpected Unrecognized error Unexpected error Mistyped parameter	Response rejected by peer Response rejected by peer Response rejected by peer Response rejected by peer Response rejected by peer

In general, return errors are categorized in following groups:

■ Generic errors: System failure, data missing, unexpected data value, etc.

■ Identification or numbering errors: Unknown subscriber, unallocated roaming number, etc.

■ Subscription errors: Roaming not allowed, illegal services, etc.

■ Short message errors: Subscriber busy for MT-SMS, SM delivery failure, etc.

The list of possible MAP return errors associated with each MAP operation code are listed Table 10-8. Understanding of MAP return codes is very helpful in resolving roaming issues. Table 10-9 lists the return errors and their descriptions.