Original link: https://www.kawabangga.com/posts/5236
A problem encountered today, record it.
The reason is that my colleagues told me that the node-exporter on two machines was down.
Received the work order, come live
I went to check it, and it did hang up. This process is running on systemd, so I went to journalctl
to check the log, and found that the log is amazing:
Observing the last three lines, I found that the process actually started successfully, and it all typed out Succeeded, but then exited immediately.
Why is this so? I checked the memory first, and found that the usage has been very low, and there is no OOM log, so it should not be OOM. Other indicators of the machine’s CPU and disk are normal, so it should not be a hardware problem.
Instead of running under systemd, I ran this process directly on the shell, and found that everything is normal. This means that the process did not exit by itself, it may be a problem with systemd, or it was killed by something else.
Since this process does not exit by itself, if others want to stop it , they can only send him a signal. So next I used killsnoop
(a program written in BPF) to see who sent a signal to my process and what signal was sent.
The following is the startup log. According to the pid, you can see that the above killsnoop is displayed, who sent the signal to this process
As you can see, it was systemd that sent it the 15 and 18 signals. 15 is SIGTERM, it should be this signal that ended my process.
So why does systemd send me this signal as soon as it starts? One situation I think of is that after the process is started, a copy is forked, and the parent process exits, so systemd kills the remaining processes. But considering that the same configuration of other machines is running normally, and node-exporter has no such fork logic as far as I know, so this is unlikely to be the case. But I decided to verify it, here I use execsnoop
to check. This eBPF program can see all the commands (spawn’s process history) running on the machine.
Found a systemctl stop node-exporter command
Good guy, I didn’t see node-exporter fork, but I saw a weird systemctl stop node-exporter
, so someone was plotting against me.
The parent process of this command is pid=177579, use pstree
to see who this guy is:
It is bash
and has no parameters, it is hung under sudo su
, it seems that someone ssh up and run a systemctl stop node-exporter
and then run away.
Use strace
to see what this bash is doing:
strace to see what this bash is doing
Sure enough, it was sleep 1s and then started killing me, and the loop continued.
Get rid of this bash, the problem is solved.
As for why someone would do such a thing on it, I asked, and it was Ye Luzi.
related articles:
This article is transferred from: https://www.kawabangga.com/posts/5236
This site is only for collection, and the copyright belongs to the original author.