Network Analysis Tip of the Month - June 2004
Corruption in a Wired Environment
In last month's Tip of the Month, we discussed corruption in a wireless environment.
This month, we'll cover corruption in a wired environment. EtherPeek breaks
corrupted packets down into four categories:
Runt, Oversize, CRC, and Frame Alignment. A runt packet is less than Ethernet's
minimum packet size of 64 bytes; an oversize packet is longer than Ethernet's
maximum packet size of 1518 bytes; a frame alignment packet does not end on
an eight-bit boundary (there are seven or less bits after the end of the last
byte); and a CRC packet ends on an eight-bit boundary and has a valid length,
but the four byte checksum at the end of the packet is incorrect. Since these
errors all occur at layers one and two of the OSI model, EtherPeek doesn't detect
them directly, but relies on the hardware to report that the error has occurred
when the packet is captured.
That leads to the first challenge of error packet analysis: most drivers don't
pass corrupted frames up to EtherPeek or even report that the corrupted packet
occurred! Unless you're using one of our error capture drivers with a supported
card, the NIC will prevent EtherPeek from even seeing the corrupted packets,
and what you can't see, you can't analyze. The second challenge is that, even
if you're using error capture drivers, in a switched environment, you must rely
on the switch to forward packets to the analyzer through a mirror port. Since
most switches don't pass error packets through a mirror port, again, the analyzer
won't be able to capture them and you can't analyze them.
These two challenges come as a rude awakening to old-time analysts who cut their
teeth on coaxial or hub-based Ethernet. In those environments, analysis of error
packets could provide valuable information about how the packets were being
corrupted. See "AA" or "55" at the end of the packet? It's
probably that your collision domain is too big, causing late collisions. See
consistently short packets (less than about ten bytes) always truncated at the
same byte offset? It's probably a reflection in the coax--check the terminators
and make sure you haven't exceeded your cable's bend radius. See packets truncated
at a random length with no "AA" or "55" patterns? That's
probably electromagnetic interference. In shared environments, error packet
analysis was crucial.
Although it's more difficult to capture error packets today, I maintain that
this is no great loss, because the nature of today's networks makes corrupted
packets much less likely and limits the possible causes of corrupted packets.
Because the Cat-5, 6, and 7 twisted pair wiring that is used in modern Ethernet
networks is so much more noise-resistant than coax and Cat-3 cable, and because
switched networks result in much smaller collision domains than shared, hub-based
networks, corrupted packets are much less likely today than they were seven
to ten years ago. Short of a few common causes, corrupted packet rates in today's
networks should be low enough that analysis of the few packets that are corrupted
is unnecessary.
When corrupted packet rates do exceed the recommended guidelines of more than
one corrupted packet per megabyte of data or one corrupted packet per 100 packets,
the problem can almost always be traced to one of these causes: bad hardware,
a cable install that doesn't meet specifications (pulling the shielding back
too far on the cable, untwisting the pairs too far, or stapling over the cable
so that the pairs are crushed together), or an extremely strong source of electromagnetic
interference (such as a large motor). Even in these cases, analysis of the corrupted
packets themselves is likely to be unnecessary. Because switched segments are
so small, it should be relatively easy to identify which switch port is experiencing
the high rate of corruption--through the switch's management interface, for
example--and then trace the cable to the end station, looking for obvious sources
of electromagnetic interference or bad cabling. Bad hardware can be detected
by switching out the hardware with another unit or moving the client to a different
switch port. If the problems continue, it's not the hardware!
Of course, this analysis is not universally true. If you have unmanaged switches,
error capture may be the only way to detect corruption on a segment. If the
cabling run is not accessible, for example because the cable is pulled through
a wiring conduit, analysis of the corrupted packets may be a more effective
means of isolating the cause of corruption than physically tracing the run.
Nevertheless, I believe that in most modern Ethernet networks, analysis of the
error packets themselves is not the best value proposition for solving the problem
of corruption on the wire, and the lack of ability to capture error packets
is not a major limitation.
|