Hardware Error Analysis and Severity Characterization in Linux-Based Server Systems
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 84421
Hardware Error Analysis and Severity Characterization in Linux-Based Server Systems

Authors: Nikolaos Georgoulopoulos, Alkis Hatzopoulos, Konstantinos Karamitsios, Konstantinos Kotrotsios, Alexandros I. Metsai

Abstract:

In modern server systems, business critical applications run in different types of infrastructure, such as cloud systems, physical machines and virtualization. Often, due to high load and over time, various hardware faults occur in servers that translate to errors, resulting to malfunction or even server breakdown. CPU, RAM and hard drive (HDD) are the hardware parts that concern server administrators the most regarding errors. In this work, selected RAM, HDD and CPU errors, that have been observed or can be simulated in kernel ring buffer log files from two groups of Linux servers, are investigated. Moreover, a severity characterization is given for each error type. Better understanding of such errors can lead to more efficient analysis of kernel logs that are usually exploited for fault diagnosis and prediction. In addition, this work summarizes ways of simulating hardware errors in RAM and HDD, in order to test the error detection and correction mechanisms of a Linux server.

Keywords: hardware errors, Kernel logs, Linux servers, RAM, hard disk, CPU

Procedia PDF Downloads 116