[DEBUG] Linux resolve df command is stuck

Error phenomenon:

Input df command and the command is stuck
Can not us cd / command to access /(root) directory
Can not us ls / command to display /(root) directory

Analysis:

There may be some network storage disconnected

Solution:

Step One: Use mount command to check if there is any disconnected network storage here

# mount

(Add: At this moment, we can see at least one remote storage mount to local directory. If we cd to this directory, system command prompt will output target is busy)

Step Two: Use unmount command to unmount this disconnected network storage

# umount -f <nfs storage which is stuck>

Or:

# umount -l <nfs storage which is stuck>

Or:

# umount -f -l <nfs storage which is stuck>

[CONTENT] Linux recommended swap size

Content:

For RHEL 6, RHEL 7, RHEL 8, RHEL 9

RAM sizeRecommended swap sizeRecommended swap size if allowing for hibernation
From 0 to 2GB 2 times the RAM size3 times the RAM size
From 2GB to 8GBThe same size of the RAM2 times the RAM size
From 8GB to 64GBAt least 4GB1.5 times the RAM size
From 64GBAt least 4GBHibernation is not recommended
RHEL 6, RHEL 7, RHEL 8, RHEL 9 Recommended Swap Size Table

Note: A 100GB swap is recommended if system with over 140 logical processes or over 3TB RAM

Reference:

https://access.redhat.com/solutions/15244

[步骤] Linux Kdump 的开启 (用于收集内核崩溃时的信息) (CentOS 7 & Rocky Linux 8 & RHEL 7 & RHEL 8 版)

步骤一:开启 Kdump

1.1 确保 crash 和 kernel-debuginfo 两个软件包已安装

# rpm -qa | grep crash || yum install crash ; rpm -qa | grep kernel-debug || yum install kernel-debug

1.2 设置内核崩溃信息的存放位置

# vim /etc/kdump.conf

修改以下内容:

......
path /var/crash
......


补充:
1) 默认的存放位置是 /var/crash
2) 把这里修改成想要存放内核崩溃信息的目录
3) 为了保险起见存放内核崩溃信息的位置最好有大于内存大小的剩余空间

1.3 重新启动 kdump 服务并设置为开机自启

# systemctl restart kdump ; systemctl enable kdump

1.4 确保 kdump 服务已经开启

# systemctl status kdump

(补充:当显示输出结果里包含 operational 或者 Active: active (exited) 时,则说明 Kdump 已经启用)

步骤二:设置收集内核崩溃信息的触发条件

2.1 当内核崩溃时自动收集内核崩溃信息

2.1.1 修改 /etc/sysctl.conf 文件
# vim /etc/sysctl.conf

添加以下内容:

......
kernel.hung_task_panic=1
2.1.2 让修改的 /etc/sysctl.conf 文件生效
# sysctl -p /etc/sysctl.conf
2.1.3 当内核崩溃时,系统会自动收集内核崩溃信息并重启

(步骤略)


注意:
1) 只有 task hang 住,或者处理器线程 soft lock 时才会自动产生 dump
2) 此过程系统会自动重启

2.2 当内核崩溃时使用魔术键收集内核崩溃信息

2.2.1 修改 /etc/sysctl.conf 文件
# vim /etc/sysctl.conf

添加以下内容:

......
kernel.sysrq = 1
2.2.2 让修改的 /etc/sysctl.conf 文件生效
# sysctl -p /etc/sysctl.conf
2.2.3 当内核崩溃时,使用魔术键收集内核崩溃信息并让系统自动重启

同时先后按下以下三个按键:

ALT + PRINTSCREEN + C


注意:
1) 此过程会让系统自动重启
2) 只是系统死机并不代表有 kernel panic

2.3 当内核崩溃时使用硬件发送 NMI 收集内核崩溃信息

2.3.1 修改 /etc/sysctl.conf 文件
# vim /etc/sysctl.conf

添加以下内容:

......
kernel.unknown_nmi_panic = 1
kernel.panic_on_unrecovered_nmi = 1
kernel.panic_on_io_nmi = 1
2.3.2 让修改的 /etc/sysctl.conf 文件生效
# sysctl -p /etc/sysctl.conf
2.3.3 当内核崩溃时,联系硬件技术支持使用硬件发送 NMI 收集内核崩溃信息

(步骤略)

步骤三:手动触发内核崩溃测试 Kdmup

# echo c > /proc/sysrq-trigger

(注意:此操作会造成系统重启)

参考文献:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide
https://access.redhat.com/solutions/916043
https://access.redhat.com/solutions/3698411
https://access.redhat.com/solutions/6038
https://access.redhat.com/solutions/23069