The monitoring of error indications in a computer system in order to log the occurrences and send alerts to system administrators and field service. Fault management software keeps track of hardware faults such as memory parity errors (see ECC memory) and software crashes. The proper analysis of the frequency and type of such errors is intended to initiate a repair order before a total breakdown occurs. See fault and fault tolerant.
An element of network management, fault management includes the detection of alarms and alerts, test and acceptance, and network recovery. Network elements (NEs) generate alarms and alerts are to indicate catastrophic failures or severe performance degradations.A network management system receives and correlates alarms and alerts from multiple NEs, and perhaps disables a failed port and enables another, or perhaps reroutes traffic around a failed switch or router after testing the alternate route. See also NE and network management.