Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
liyapeng
/
log_collect
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Issues
0
Merge Requests
0
Pipelines
Wiki
Snippets
Settings
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit 8e1c52ea
authored
Jan 22, 2024
by
liyapeng
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
增加了判断是否存在D状态进程,以不同方式执行nvidia-bug-report.sh来避免卡死;另外增加了搜集nvrm信息
1 parent
400b596a
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
1 deletions
log_collect.sh
log_collect.sh
View file @
8e1c52e
...
@@ -66,13 +66,19 @@ GPULogCollect() {
...
@@ -66,13 +66,19 @@ GPULogCollect() {
if
[
-f /usr/bin/nvidia-bug-report.sh
]
&&
[
$gpuInstanceState
-gt 0
]
;
then
if
[
-f /usr/bin/nvidia-bug-report.sh
]
&&
[
$gpuInstanceState
-gt 0
]
;
then
mkdir -p
${
LOG_FILE_PATH
}
/GPULogCollect
mkdir -p
${
LOG_FILE_PATH
}
/GPULogCollect
echo
"Start to collect gpu log for instance
$(
hostname
)
by nvidia-bug-report.sh"
echo
"Start to collect gpu log for instance
$(
hostname
)
by nvidia-bug-report.sh"
nvidia-bug-report.sh --safe-mode --output-file
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-bug-report.log.gz.gz
d_processes
=
$(
ps aux | awk
'$8=="D" {print}'
)
if
[
-n
"
$d_processes
"
]
;
then
nvidia-bug-report.sh --safe-mode --extra-system-data --output-file
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-bug-report.log.gz
else
nvidia-bug-report.sh --safe-mode --output-file
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-bug-report.log.gz
fi
timeout 30 nvidia-smi >
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-smi.log
timeout 30 nvidia-smi >
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-smi.log
nvidia-smi topo -m >>
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-smi.log
nvidia-smi topo -m >>
${
LOG_FILE_PATH
}
/GPULogCollect/nvidia-smi.log
lspci -d 10de: | egrep
"VGA|3D"
>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
lspci -d 10de: | egrep
"VGA|3D"
>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
lspci -vvv -t >>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
lspci -vvv -t >>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
lspci -vvv >>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
lspci -vvv >>
${
LOG_FILE_PATH
}
/GPULogCollect/lspci-nvidia.log
dmesg -T >
${
LOG_FILE_PATH
}
/GPULogCollect/dmesg-gpu.log
dmesg -T >
${
LOG_FILE_PATH
}
/GPULogCollect/dmesg-gpu.log
journalctl | grep -i nvrm > journalctl_nvrm.txt
# get slot info
# get slot info
touch
${
LOG_FILE_PATH
}
/GPULogCollect/slot-info.txt
touch
${
LOG_FILE_PATH
}
/GPULogCollect/slot-info.txt
nvidia-smi --query-gpu
=
index,gpu_name,gpu_bus_id,uuid --format
=
csv >
${
LOG_FILE_PATH
}
/GPULogCollect/slot-info.txt
nvidia-smi --query-gpu
=
index,gpu_name,gpu_bus_id,uuid --format
=
csv >
${
LOG_FILE_PATH
}
/GPULogCollect/slot-info.txt
...
...
Write
Preview
Markdown
is supported
Attach a file
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to post a comment