리눅스 nvidia gpu 스트레스 테스트.
1. 테스트환경
OS: CentOS 7
GPU : Nvidia Tesla V100-SXM2 x 4
CPU: Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz
GPU tools: cuda toolkit
2. gpu burn 다운로드
http://wili.cc/blog/gpu-burn.html 에서 아래 파일을 다운로드 받는다.
http://wili.cc/blog/entries/gpu-burn/gpu_burn-1.0.tar.gz
# curl -O http://wili.cc/blog/entries/gpu-burn/gpu_burn-1.0.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 7289 100 7289 0 0 4666 0 0:00:01 0:00:01 --:--:-- 4669
3. 컴파일 및 실행
다운로드 받은 파일을 풀면 아래 세가지 파일이 보인다.
# tar xvpzf gpu_burn-1.0.tar.gz Makefile gpu_burn-drv.cpp compare.cu
이제, make 명령어로 컴파일한다.
# make
gpu_burn 실행해 본다.
# ./gpu_burn Run length not specified in the command line. Burning for 10 secs GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-4d5883b8-3425-a57d-f118-16a0e641e7a3) GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-6cf94ffe-8930-d4bc-3ea6-4fc7b521235a) GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-345d135f-3217-7863-35e3-4fdd3ca75eb4) GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-d9e03e1a-3d3f-9f6b-7b23-ad26921e4481) Initialized device 0 with 32510 MB of memory (32123 MB available, using 28911 MB of it), using FLOATS Initialized device 3 with 32510 MB of memory (32123 MB available, using 28911 MB of it), using FLOATS Initialized device 2 with 32510 MB of memory (32123 MB available, using 28911 MB of it), using FLOATS Initialized device 1 with 32510 MB of memory (32123 MB available, using 28911 MB of it), using FLOATS 50.0% proc'd: 1804 (6968 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 33 C - 35 C - 33 C - 35 C Summary at: Mon Mar 2 15:10:17 KST 2020 50.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 0 - 0 - 0 - 0 t50.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 0 (0 Gflop/s) errors: 0 - 0 - 0 -50.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 050.0% proc'd: 1804 (6968 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 070.0% proc'd: 3608 (13498 Gflop/s) - 1804 (6756 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 54 C - 57 C - 57 C - 58 C Summary at: Mon Mar 2 15:10:19 KST 2020 70.0% proc'd: 3608 (13498 Gflop/s) - 3608 (13518 Gflop/s) - 1804 (6749 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 -70.0% proc'd: 3608 (13498 Gflop/s) - 3608 (13518 Gflop/s) - 3608 (13413 Gflop/s) - 1804 (6704 Gflop/s) errors: 0 70.0% proc'd: 3608 (13498 Gflop/s) - 3608 (13518 Gflop/s) - 3608 (13413 Gflop/s) - 3608 (13033 Gflop/s) errors: 090.0% proc'd: 5412 (13432 Gflop/s) - 3608 (13518 Gflop/s) - 3608 (13413 Gflop/s) - 3608 (13033 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 54 C - 57 C - 57 C - 58 C Summary at: Mon Mar 2 15:10:21 KST 2020 90.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 3608 (13413 Gflop/s) - 3608 (13033 Gflop/s) errors: 0100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 3608 (13033 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 5412 (13432 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 100.0% proc'd: 7216 (13407 Gflop/s) - 5412 (13501 Gflop/s) - 5412 (13339 Gflop/s) - 5412 (13037 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 59 C - 58 C - 58 C - 56 C Killing processes.. done Tested 4 GPUs: GPU 0: OK GPU 1: OK GPU 2: OK GPU 3: OK
이제, 스트레스 테스트를 진행하려면 진행 시간을 주면된다. 초단위로.
한 시간 동안 테스트진행하려면,
# ./gpu_burn 3600
4. 기타사항
gpu를 확인할 수 있는 툴은 gpustat, gpumonitor, glance 등 여러가지가 있으나, nvidia gpu를 사용하기위한 드라이버와 유틸리티를 설치했다면, nvidia-smi를 즉시 사용할 수 있다.
스트레스 테스트중에 nvidia-smi 명령어를 내리면 아래와 같이 GPU 사용율이 100% 임을 볼 수 있다.
# nvidia-smi Mon Mar 2 15:22:40 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | 0 | | N/A 63C P0 294W / 300W | 29282MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | 0 | | N/A 73C P0 298W / 300W | 29282MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:B2:00.0 Off | 0 | | N/A 63C P0 175W / 300W | 29282MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:B3:00.0 Off | 0 | | N/A 73C P0 293W / 300W | 29282MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 47138 C ./gpu_burn 29271MiB | | 1 47164 C ./gpu_burn 29271MiB | | 2 47165 C ./gpu_burn 29271MiB | | 3 47166 C ./gpu_burn 29271MiB | +-----------------------------------------------------------------------------+