IPv6 Routing Table in the BSD Kernel

Consider the simple network topology depicted in Figure 1-41. Of our particular interest is the FreeBSD IPv6 router, which connects two Ethernet links with interfaces ne0 and ne1. The other router on the link attached to interface ne0, whose link-local address is fe80::1, connects the entire example network to the Internet, and provides the default route for the FreeBSD router.

FIGURE 1-41

FIGURE 1-41

On the other hand, the link attached to ne1 has a global IPv6 prefix, 2001:db8:0:1000::/64. There is another router on that link whose link-local address is also fe80::1, the gateway to a different subnet of 2001:db8:0:2000::/64. Even though the two additional routers have the same link-local address, there is no conflict since these are in different links.

Assume the FreeBSD router has enough routes to reach all the visible networks in Figure 1-41. Then it should have IPv6 routing entries as shown in Listing 1-1.

Listing 1-1

Listing 1-1


The default route and the route to 2001:db8:0:2000::/64 are indirect routes, which have the gateway flag (G), and are likely to be learned from the other routers via some routing protocol. Notice that the gateway addresses are differentiated with the appropriate link zone index, as represented by the extended format with the percent character (e.g., %ne0). As can be seen in this example, the gateway address of an IPv6 indirect route is usually a link-local address, since all interior routing protocols for IPv6 use link-local addresses for exchanging routes with adjacent routers and those addresses are often used as the gateway address; refer to Section 1.4.2 for RIPng and to Section 1.6.5 for OSPFv3.

Other network routes are direct routes to an interface and have the cloning flag (C). Specific routes under the direct routes are cloned as necessary, and store the corresponding link-layer address as the gateway.An example of such routes is the one for 2 0 01 :db8 :0:1000:203:47ff:fea5:3085.

In the kernel, each routing entry is represented as an rtentry{} structure, which is defined in the route.h header file. Figure 1-42 depicts major members of this structure corresponding to some characteristic entries in Listing 1-1. The first entry is an example of an indirect route for 2 0 01 :db8:0:2000::/64. The middle entry is a direct route for fe80::%ne1/64. Finally, the last entry is the route for fe80::1%ne1, cloned from the middle entry.

As one might notice, the link index of a link-local destination or gateway is represented differently. This will be explained in more detail in Section 1.8.1.

The set of rt_key and rt_mask (which are actually function macros taking rtentry{} and returning pointers to sockaddr{} structures) defines a single IPv6 prefix and acts as a key in the routing table. In the BSD’s routing table, a network prefix is always defined as a set of an address and network mask. This also applies to IPv6 even though IPv6 does not support the notion of general network masks, especially non-contiguous ones.In the actual routing table, some trailing parts of a network mask are often redundant and truncated. For instance, in order to represent prefix "/64", it suffices to have the non-zero fields of the sin6_addr member, assuming the rest of the structure is all zero. Examples in Figure 1-42 show the truncated form.

FIGURE 1-42

FIGURE 1-42

The rt_mask member is NULL for host routes (ones that have the "H" flag set in Listing 1-1). In Figure 1-42, the last entry is an example of this case.

The rt_gateway member is a pointer to a sockaddr{} structure, which specifies the next hop of this entry. As mentioned above, this can be a link-layer sockaddr{} structure, in which case this entry usually associates with a Neighbor Cache entry via the rt_llinfo member.Additionally, the rt_gateway member points to a sockaddr_dl{} structure which stores the Ethernet address of the destination.

The rt_ifp member is a pointer to an ifnet{} structure, specifying the outgoing interface to the next hop.

Scope Zone Representation in the Routing Table

Notice that the link index(*) of link-local addresses in Figure 1-42 is embedded in the 128-bit address field. For example, link-local address fe80::1%ne1 in Listing 1-1 is represented as fe80:2::1 in Figure 1-42. Since link-local addresses may not be unique on different links and the single routing table must contain all the possibly ambiguous addresses, it is necessary to specify the associated link of a link-local address in the routing table in some way. In theory, this could be done using the dedicated sin6_scope_id field because the BSD’s routing table can generally handle addresses with sockaddr{} structures.

(*) The KAME implementation assumes a one-to-one mapping between links and interfaces whereas links are larger in scope than interfaces from a pure architectural point of view [RFC4007]. This assumption allows link indices to be represented as interface identifiers of the outgoing interface. In the example routing table shown in Listing 1-1, the interface index is represented as in link#1, where 1 is the index.

Routing daemons, which generally handle addresses passed from a routing socket (see Section 1.9.1), are a common example of such applications. They also need to embed the appropriate scope zone index in an address before passing it to the kernel through a routing socket. Another class of applications that suffers from the embedded format is one that directly refers to the kernel memory in order to manage the kernel routing table or interface address structures, such as the netstat program.

Figure 1-43 is a copy of the figure shown in the Core Protocols topic, highlighting the main applications discussed in this section: routing and management applications. It should be noted that a routing application usually also acts as a "normal application" when it sends or receives routing messages via an AF_INET6 socket. In addition, it often uses the source address of an inbound routing message as a next hop to some destination and installs the corresponding routing message in the kernel via a routing socket. This means the application needs to convert the standard form of address into the embedded form by themselves.

FIGURE 1-43

FIGURE 1-43

One may notice that the actual output from netstat hides the embedded form of link-local addresses from users as shown in Listing 1-1. This is because the program applies a special filter to IPv6 addresses of some narrower scopes before printing those addresses.

Listing 1-2 shows this filter. If the specified address has the link-local or interface-local scope(*), the 128-bit IPv6 address is in the kernel internal form, embedding the scope zone index as a 16-bit integer in its third and fourth bytes. Lines 584 and 585 extract the embedded index, copy it to the sin6_scope_id field, and clear the embedded value. Later, separate routines will call the getnameinfo() function, which converts the scoped address into a textual representation using the extended notation.

(*) The interface-local scope was previously called node-local, which was renamed in [RFC3513]. Unfortunately, the standard API does not catch up to this change, so portable applications need to keep using the old terminology.

Listing 1-2

Listing 1-2

As detailed in the Core Protocols topic, such a workaround is a bad practice; applications should not care about the kernel-specific details for many reasons. It complicates application programs and can easily be a source of bugs. Despite those defects, this is something that such special application programs must endure as a matter of fact.

Next post:

Previous post: