Kernel : Millions lines of code working in harmony and something disturbs it to the hell, what could be.. the world of kernel topples over, upside down.. and it screeches the hell out of its core like OOPS. SOS call, lets dive in..
Human Behavior after seeing OOPS: “ what is this, so cryptic, ambiguous, partial, nerdy.. and so on”...
As per kernel literature, “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the console; it contains the processor status and the CPU registers of when the fault occurred. The offending process that triggered this Oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an Oops has occurred, the system cannot be trusted any further.
To help kernel developers out a bit, Kerneloops.org website collects these crash signatures in a big database. The website recently registered the hundred-thousandth oops just about the time it celebrated its first birthday. The oops database gives kernel programmers a better chance of determining what led to the chaos in memory so that they can hopefully solve the problem more quickly.Number 100,000 at Kerneloops proved to be caused by the wifi driver that runs the Intel Wireless WiFi Link 4965AGN and PRO/Wireless 3945ABG adapters.
To classify, there are 2 types of kernel panics, hard panics (Aiee!) and soft panics (Oops!), interesting, isn’t it, the more interesting part is to zero on the root cause and find out the way to debug it and this is the Real macho talent, the boys can’t do it, so in other words, kernel debugging is out of scope but i will definitely provide good pointers for self – learning, in case, you feel motivated
Most Common Reasons
Only modules that are located within kernel space can directly cause the kernel to panic. To see what modules are dynamically loaded, do lsmod this shows all dynamically loaded modules (Dialogic® drivers, LiS, SCSI driver, filesystem, etc.). In addition to these dynamically loaded modules, components that are built into the kernel (memory map, etc.) can cause a panic.
Common driver logic plays here : when a driver crashes within an interrupt handler, usually because it tried to access a null pointer within the interrupt handler. When this happens, that driver cannot handle any new interrupts and eventually the system crashes.
Since hard panics and soft panics are different in nature, this document discusses how to deal with hard panics only as soft panics falls on similar lines .
Kernel BSOD Symptoms :
1) Machine is completely locked up and unusable
2) Num Lock / Caps Lock / Scroll Lock keys usually blink
3) If in console mode, dump is displayed on monitor (including the phrase “Aieee!”)
4) Similar to Windows® Blue Screen of Death
Where is the oops ?
As per kernel documentation
Normally the Oops text is read from the kernel buffers by klogd and
handed to syslogd which writes it to a syslog file, typically
/var/log/messages (depends on /etc/syslog.conf). Sometimes klogd dies,
in which case you can run dmesg > file to read the data from the kernel
buffers and save it. Or you can cat /proc/kmsg > file, however you
have to break in to stop the transfer, kmsg is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options :-
(1) Hand copy the text from the screen and type it in after the machine
has restarted. Messy but it is the only option if you have not
planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you
may find that booting with a higher resolution (eg, vga=791)
will allow you to read more of the text. (Caveat: This needs vesafb,
so won't help for 'early' oopses)
(2) Boot with a serial console (see Documentation/serial-console.txt),
run a null modem to a second machine and capture the output there
using your favourite communication program. Minicom works well.
(3) Use Kdump (see Documentation/kdump/kdump.txt),
extract the kernel ring buffer from old memory with using dmesg
gdbmacro in Documentation/kdump/gdbmacros.txt.
So for kernel debugging process, we have to use certain tools like kdump, LKCD (linux kernel crash dump), KDB (Kernel Debugger), ksymoops , Kernel source code and very important You brain !