How to Debug a Crashing System

1. Create a NULL Modem Cable

The NULL Modem cable should have DB9 connectors, both ends FEMALE. The NULL modem cable instructions can be found on this link: click here 




For more info an dumb null modem cable, without handshaking click here.

2. LAPTOP: Termial Setup

Connect the null modem cable to a laptop serial port.
(Note: If Laptop doesnt have a serial port, one can use a USB<->Serial DB9 port adapter)

Start up HyperTermial in Windows or Minicom in Linux

Configure Termial for COM port number associated withthe serial port of the laptop.
The connection parameters are as follows:
Bits Per Second: 115200
Data Bits: 8
Parity: None
Stop Bits: 1
Flow Control: None
Start the Termial Program with above parameters.

On Windows: Select Transfer from Menu and Coose Capture File
On Linux: Select Capture to File option

Connect the other end of the Null Modem Cable into the serial port of SERVER.

3. SERVER: Configure Linux Boot Start For Serial Output

The boot start config file: /etc/grub.conf or /etc/lilo.conf

For /etc/grub.conf:
------------CUT HERE--------------------
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS 4.0 (2.6.9-5.0.3.ELsmp)
    root (hd0,0)
    kernel /vmlinuz-2.6.9-5.0.3.ELsmp ro root=LABEL=/1 rhgb quiet console=ttyS0,115200n8 console=tty0
    initrd /initrd-2.6.9-5.0.3.ELsmp.img
-----------CUT HERE-----------------------

Note: The addition of "console=ttyS0,115200n8 console=tty0on the kernel line. Also note it has to be on the same exact line as the kernel line, if you move it down a line this will cause an error. 


5. SERVER: Configure Login Prompt from Serial Port

In /etc/inittab add LAST line
---------------------cut here-------------------------
# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6
7:2345:respawn:/sbin/agetty -L 115200 ttyS0
---------------------cut here-------------------------

This will allow a user from LAPTOP to log into the server via serial port


6. SERVER: Configure Root Access from Serial Port

In /etc/securetty add the following line to the end of the file.
---------------------cut here-------------------------
ttyS0
---------------------cut here-------------------------

This will allow root login access from the SERIAL PROT!

Not really a security concern because user has to be physically at the machine.
However, for paranoid security reasons, after debugging one can remove this option.


7. REBOOT SERVER


8. LAPTOP: Terminal Output

After a SERVER reboot one should see kernel startup messages on Laptop Terminal!

Note that not all kernel messages will be displayed, only the imprtant high priorty messages as well as any ERROR messages generated by the kernel.


If the kernel startup messages are clearly displayed on the ternmial during the bootup, the current setup is ready to capture the kernel OOPS (Error) message during a SERVER HANG.

You should also see LOGIN PROMPT on the TERMINAL SCREEN.

9. LAPTOP: Log In via Serial Port

Log into the SERVER via Serial Access

Run:  tail -f /var/log/messages

Wait for Crash!

Once the hang is captured on the screen, close the FILE being captured:
On Windows:  Select Transfer from Menu and  Click on Close Captured FIle
On Linux: Close Captured File

10. Send the captured kernel OOPS file to Sangoma Technical Support

If kernel OOPS (Stack Dump) is captured on SERVER HANG, the problem lies either in kernel/drivers or system.

If NO kernel OOPS (Stack Dump) is captured on HANG then the problem is in low level HARDWARE. 

eg: PCI Bus or System POWER.

11. Send output of ifconfig

eg: ifconfig > ifconfig.output.txt