[步骤] Linux page_owner 排查工具的使用 (记录内存的使用情况)

正文:

步骤一:查看 page_owner 是否开启

1.1 通过 dmesg 命令查看 page_owner 是否开启

# dmesg | grep page_owner
[    1.149165] page_owner is disabled

(补充:当显示此类信息时则 page_owner 没有开启)

1.2 通过 /sys/kernel/debug/ 目录查看 page_ownerr 是否开启

# ls -l /sys/kernel/debug/page_owner
ls: cannot access /sys/kernel/debug/page_owner: No such file or directory.

(补充:当 /sys/kernel/debug/page_owner 文件不存在时则 page_owner 没有开启)

步骤二:开启 page_owner

2.1 开启 page_owner

# grubby --args="page_owner=on" --update-kernel=0

(注意:开启 page_owner 会额外占用一定量的内存)

2.2 重启系统

# reboot

2.3 确认 page_owner 已经开启

2.3.1 通过 dmesg 命令确认 page_owner 是否开启
# dmesg | grep page_owner
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-425.19.2.el8_7.x86_64 root=/dev/mapper/rootvg-rootlv ro ipv6.disable=1 audit=1 audit_backlog_limit=8192 crashkernel=auto resume=/dev/mapper/rootvg-swaplv rd.lvm.lv=rootvg/rootlv rd.lvm.lv=rootvg/swaplv rhgb quiet rd.shell=0 page_owner=on
[    0.000000] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-425.19.2.el8_7.x86_64 root=/dev/mapper/rootvg-rootlv ro ipv6.disable=1 audit=1 audit_backlog_limit=8192 crashkernel=auto resume=/dev/mapper/rootvg-swaplv rd.lvm.lv=rootvg/rootlv rd.lvm.lv=rootvg/swaplv rhgb quiet rd.shell=0 page_owner=o

(补充:当显示此类信息时则 page_owner 已经开启)

2.3.2 通过 /sys/kernel/debug/ 目录确认 page_ownerr 是否开启
# ls -l /sys/kernel/debug/page_owner
-r--------. 1 root root 0 Apr 13 14:36 /sys/kernel/debug/page_owner

(补充:当 /sys/kernel/debug/page_owner 文件存在时则 page_owner 已经开启)

步骤三:分析 page_owner 产生的记录

3.1 导出 page_owner 产生的记录

# cat /sys/kernel/debug/page_owner > page_owner_full.txt

(补充:这里以将 page_owner 产生的记录导出到名为 page_owner_full.txt 文件为例)


注意:
1) 此命令会产生体积很巨大的文件
2) 此命令会持续执行直到手动取消
3) 可以通过同时按下 “ctrl” 键和 “C” 键或者使用 kill 命令取消
4) 如果内存变化较快则可以让此命令多执行一会,反之则可以少执行一会

3.2 解析 page_owner 产生的记录

# page_owner_sort page_owner_full.txt sorted_page_owner.txt
loaded 42903
sorting ....
culling

(补充:这里以分析名为 page_owner_full.txt 的文件并将分析结果导入到 sorted_page_owner.txt 文件为例)

3.3 查看 page_owner 产生的记录

# less sorted_page_owner.txt
1 times:
Page allocated via order 0, mask 0x0(), pid 1, tgid 1 (swapper/0), ts 48952109 ns, free_ts 0 ns
PFN 4096 type Unmovable Block 8 type Unmovable Flags 0xfffffc0000100(slab|node=0|zone=1|lastcpupid=0x1fffff)
 register_early_stack+0x28/0x60
 init_page_owner+0x30/0x2d0
 kernel_init_freeable+0x13c/0x232
 kernel_init+0xa/0x108

1 times:
Page allocated via order 0, mask 0x0(), pid 1, tgid 1 (swapper/0), ts 48952566 ns, free_ts 0 ns
PFN 4097 type Unmovable Block 8 type Unmovable Flags 0xfffffc0000100(slab|node=0|zone=1|lastcpupid=0x1fffff)
 register_early_stack+0x28/0x60
 init_page_owner+0x30/0x2d0
 kernel_init_freeable+0x13c/0x232
 kernel_init+0xa/0x108
......

(补充:这里以查看名为 sorted_page_owner.txt 文件里的分析结果为例)

步骤四:关闭 page_owner

4.1 开启 page_owner

# grubby --remove-args="page_owner=on" --update-kernel=0

(注意:关闭 page_owner 会额外释放一定量的内存)

4.2 重启系统

# reboot

4.3 确认 page_owner 已关闭

4.3.1 通过 dmesg 命令确认 page_owner 是否关闭
# dmesg | grep page_owner
[    2.022585] page_owner is disabled

(补充:当显示此类信息时则 page_owner 没有开启)

4.3.2 通过 /sys/kernel/debug/ 目录确认 page_ownerr 是否关闭
# ls -l /sys/kernel/debug/page_owner
ls: cannot access '/sys/kernel/debug/page_owner': No such file or directory

(补充:当 /sys/kernel/debug/page_owner 文件不存在时则 page_owner 没有开启)

参考文献:

https://access.redhat.com/solutions/5609521

[CONTENT] Linux maximum number of processes setting

Case One: Set the maximum number of processes for all users

# vim /etc/security/limits.conf

Add the following

......
* soft nofile 10240
* hard nofile 10240

(Add: Take maximum number of processes is 10240 for everyone as an example here)

Case Two: Set the maximum number of processes for one group

# vim /etc/security/limits.conf

Add the following

......
@mingyuzhu soft nofile 10240
@mingyuzhu hard nofile 10240

(Add: Take maximum number of processes is 10240 for group mingyuzhu as an example here)

Case Two: Set the maximum number of processes for one user

# vim /etc/security/limits.conf

Add the following

......
mingyuzhu soft nofile 10240
mingyuzhu hard nofile 10240

(Add: Take maximum number of processes is 10240 for user mingyuzhu as an example here)

[工具] Shell 显示系统常用信息

介绍

基本信息

作者:朱明宇
名称:显示系统常用信息
作用:显示系统常用信息

使用方法

1. 在此脚本的分割线内写入相应的内容
2. 给此脚本添加执行权限
3. 执行此脚本

脚本分割线里的变量

1. times=5 #显示系统常用信息的次数
2. sleeptime=0.3 #大部分行与行之间显示的间隔时间

注意

1. 需要安装 sysstat 软件
2. 执行此脚本的用户能够使用 sudo ip a s 命令
3. 执行此脚本的用户能够使用 sudo ss -ntulap 命令
4. 搭建了 KVM 虚拟化平台后执行此脚本的用户能够使用 sudo virsh list 命令后才能实现

脚本

#!/bin/bash

####################### Separator ########################
times=5
sleeptime=0.3
####################### Separator ########################

nowtime=1

while (( nowtime <= times))
do
        echo -e "Start Monitoring: \c"
	for i in {1..94}
	do
	        echo -e "#\c"
		sleep 0.01
        done
	echo

	sleep $sleeptime
        host=`hostname`
        echo -e "Name:\t\t\t\t\t\t\t \033[1m$host\033[0m"

        ip=`sudo ip a s | awk '/[1-2]?[0-9]{0,2}\.[1-2]?[0-9]{0,2}/&&!/127.0.0.1/{print $2}' | awk -F/ '{print $1}'`
	for iip in $(echo $ip)
        do
		sleep $sleeptime
                echo -e "IP Address:\t\t\t\t\t\t \033[1m$iip\033[0m"
        done

        sleep $sleeptime

        cpu=`top -bn 1 | awk -F',' '/^%Cpu/{print $4 }' | awk '{print $1}' | awk '{print 100-$1}'`
        echo -e "CPU Usage (Total):\t\t\t\t\t \033[1m$cpu%\033[0m"

        sleep $sleeptime

        mem=`free | grep Mem | awk '{print $3/$2 * 100.0}' | egrep -o "[1]?[0-9]{0,2}\.[0-9]"`
        echo -e "Memory Usage (Total):\t\t\t\t\t \033[1m$mem%\033[0m"

	directory=`df -h | grep -v run | grep -v boot | awk '$1~/\/dev/{print $6}'`
        for idirectory in `echo $directory`
        do
                sleep $sleeptime
                directoryusage=`df -h | grep -v run | grep -v boot | awk '$1~/\/dev/{print}' | grep $idirectory$ | awk '{print $5}'`
		if [ $idirectory == / -o $idirectory == /sda -o $idirectory == /sdb  ];then
                        echo -e "Directory Usage ($idirectory):\t\t\t\t\t \033[1m$directoryusage\033[0m"
	        else
                        echo -e "Directory Usage ($idirectory):\t\t\t\t \033[1m$directoryusage\033[0m"
		fi
        done

	sudo -l | grep 'virsh list' &> /dev/null
        if [ $? -eq 0 ];then
	        sleep $sleeptime
	        virtual=`sudo virsh list | egrep [0-9] | wc -l`
	        echo -e "Number of Virtual Machines (Total):\t\t\t \033[1m$virtual\033[0m"
        fi

        sleep $sleeptime

        user=`who | wc -l`
        echo -e "Number of User Logins (Total):\t\t\t\t \033[1m$user\033[0m"

        soft=`rpm -qa | wc -l`
        echo -e "Number of Softwares (Total):\t\t\t\t \033[1m$soft\033[0m"

        sleep $sleeptime

        port=`sudo ss -ntulap | wc -l`
        echo -e "Number of Open Ports (Total):\t\t\t\t \033[1m$port\033[0m"

        which sar &> /dev/null
        if [ $? -eq 0 ];then
                networkcard=`ifconfig | awk -F: '/flags/&&!/lo/{print $1}'`
                for inetworkcard in `echo $networkcard`
                do
                        networkread="`sar -n DEV 1 1 | grep $inetworkcard | awk '/[0-9][0-9]:[0-9][0-9]/{print $3/1000}'` m/s"
                        networkwrite="`sar -n DEV 1 1 | grep $inetworkcard | awk '/[0-9][0-9]:[0-9][0-9]/{print $4/1000}'` m/s"
			echo $inetworkcard | grep eth &> /dev/null
			if [ $?  -ne 0 ];then
	                echo -e "Network Card IO ($inetworkcard):\t\t\t\t \033[1m$networkread\033[0m (Read)\t\033[1m$networkwrite\033[0m (Write)"
		        else
	                echo -e "Network Card IO ($inetworkcard):\t\t\t\t\t \033[1m$networkread\033[0m (Read)\t\033[1m$networkwrite\033[0m (Write)"
			fi
                done
        fi

        which iostat &> /dev/null
        if [ $? -eq 0 ];then
	        disk=`iostat -d -k 1 1 | awk '!/^$/&&!/Device/&&!/Linux/{print $1}'`
                for idisk in `echo $disk`
	        do
			sleep $sleeptime
		        diskread="`iostat -d -k 1 1 | grep $idisk |  awk '{print $3/1000}'` m/s"
		        diskwrite="`iostat -d -k 1 1 | grep $idisk |  awk '{print $4/1000}'` m/s"
			echo $idisk | grep 'nvme' &> /dev/null
			if [ $? -eq 0 ];then
		                echo -e "Disk IO (/dev/$idisk):\t\t\t\t\t \033[1m$diskread\033[0m (Read)\t\033[1m$diskwrite\033[0m (Write)"
		        else
		                echo -e "Disk IO (/dev/$idisk):\t\t\t\t\t \033[1m$diskread\033[0m (Read)\t\033[1m$diskwrite\033[0m (Write)"
			fi
	        done

        fi

        echo -e "Complete Monitoring: \c"
        for i in {1..91}
        do
                echo -e "#\c"
                sleep 0.01
        done
        echo
        sleep $sleeptime

        let nowtime++
done

        echo -e "Terminal Monitoring: \c"
        for i in {1..91}
        do
                echo -e "#\c"
                sleep 0.01
        done

exit