익스트림 스위치 CPU 점유문제를 확인하고 해결한 방법.
모델: Extreme X430-48t
이상 증상: CPU 0% IDLE. 응답시간이 길어지는 현상 발생
top 명령으로 리소스 사용량을 확인해보니, 아래처럼, 0.0% idle, bcmRX, tbcm_msm_txX 프로세스의 CPU 점유율이 높다.
Switch.1 # top Mem: 193584K used, 61432K free, 0K shrd, 0K buff, 78476K cached CPU: 0.0% usr 20.0% sys 0.0% nic 0.0% idle 0.0% io 0.0% irq 80.0% sirq Load average: 7.42 7.58 7.38 4/152 1532 PID PPID USER STAT RSS %MEM CPU %CPU COMMAND 1110 2 root RW 0 0.0 0 42.1 [bcmRX] 1107 2 root DW< 0 0.0 0 42.1 [tbcm_msm_tx1] 1532 1531 root R 648 0.2 0 10.5 top -d 3 1105 2 root DW< 0 0.0 0 5.2 [tbcm_msm_tx0] 1262 1 root S 12224 4.7 0 0.0 ./cliMaster 1240 1 root S < 10440 4.0 0 0.0 ./hal 1333 1 root S 5760 2.2 0 0.0 ./netTools 1342 1 root S 5296 2.0 0 0.0 ./etmon 1269 1 root S 4816 1.8 0 0.0 ./snmpMaster 1358 1 root S 4688 1.8 0 0.0 ./xmld 1271 1 root S 4100 1.6 0 0.0 ./snmpSubagent 1371 1 root S 4064 1.5 0 0.0 ./idMgr 1282 1 root S 3964 1.5 0 0.0 ./aaa -t random 1320 1 root S 3800 1.4 0 0.0 ./rtmgr update 1236 1 root S 3416 1.3 0 0.0 ./emsServer 1326 1 root S 3376 1.3 0 0.0 ./mcmgr 1286 1 root S 3332 1.3 0 0.0 ./vlan 1288 1 root S 3060 1.2 0 0.0 ./fdb 1330 1 root S 2964 1.1 0 0.0 ./acl 1344 1 root S 2720 1.0 0 0.0 ./thttpd 1264 1 root S 2560 1.0 0 0.0 ./cfgmgr 1310 1 root S 2532 0.9 0 0.0 ./stp 1376 1 root S 2528 0.9 0 0.0 ./erps 1366 1 root S 2496 0.9 0 0.0 ./dot1ag 1383 1 root S 2420 0.9 0 0.0 ./exsshd 1303 1 root S 2400 0.9 0 0.0 ./eaps 947 1 root S 2384 0.9 0 0.0 /exos/bin/epm -t 40 -f /exos/config/epmrc.L2 -d /exos/config/epmdprc 1335 1 root S 2368 0.9 0 0.0 ./netLogin 1360 1 root S 2316 0.9 0 0.0 ./xmlc 1305 1 root S 2284 0.8 0 0.0 ./esrp 1339 1 root S 2244 0.8 0 0.0 ./telnetd -e 1378 1 root S < 2240 0.8 0 0.0 ./mrp 1299 1 root S 2216 0.8 0 0.0 ./lldp 1362 1 root S 2212 0.8 0 0.0 ./ipSecurity 1337 1 root S 2200 0.8 0 0.0 ./techSupport 1364 1 root S 2168 0.8 0 0.0 ./upm 1352 1 root S 2132 0.8 0 0.0 ./poe 1297 1 root S 2124 0.8 0 0.0 ./edp 1290 1 root S 2116 0.8 0 0.0 ./elrp 1301 1 root S 2112 0.8 0 0.0 ./lacp 1368 1 root S 2056 0.8 0 0.0 ./hclag 1292 1 root R 2000 0.7 0 0.0 ./elsm 1238 1 root S 1948 0.7 0 0.0 ./devmgr
bcmRX, tbcm_msm_tx1의 높은 CPU 점유율에 관한 아래 문서를 확인했으나, 이 경우와는 관련 없는 것으로 판다.
https://gtacknowledge.extremenetworks.com/articles/Solution/High-CPU-utilization-for-the-tbcm-msm-tx1-and-bcmRX-coupled-with-CPU-congestion-on-BD8800-IO-modules
아래 문서에서 0.0% CPU idle의 경우 loop 발생가능성이 있다는 언급을 보고, 아래처럼 포트 사용율을 확인했다. 일단, 아래처럼 27, 30번 포트가 의심스러움.
* Switch.2 # sh ports util band Port Link Link Rx Peak Rx Tx Peak Tx State Speed % bandwidth % bandwidth % bandwidth % bandwidth ================================================================================ 1 R 0 0.00 0.00 0.00 0.00 2 A 100 0.00 0.18 0.00 3.33 3 A 100 0.00 0.49 0.00 2.27 4 A 100 0.00 0.12 0.00 2.33 5 R 0 0.00 0.00 0.00 0.00 6 A 100 0.01 0.03 0.02 0.95 7 R 0 0.00 0.00 0.00 0.00 8 A 100 0.03 0.04 1.28 57.47 9 A 100 0.00 0.15 0.00 1.42 10 A 100 0.00 0.00 0.00 0.57 11 R 0 0.00 0.00 0.00 0.00 12 A 100 0.00 0.00 0.00 0.57 13 R 0 0.00 0.00 0.00 0.00 14 A 100 0.03 0.04 2.35 3.63 15 A 100 0.00 0.02 0.00 0.57 16 A 100 0.02 0.14 0.33 8.06 17 A 100 0.00 0.02 0.00 1.74 18 A 100 0.00 0.03 0.00 0.59 19 R 0 0.00 0.00 0.00 0.00 20 A 100 0.00 0.02 0.00 0.57 21 A 100 0.00 0.00 0.00 0.57 22 R 0 0.00 0.00 0.00 0.00 23 R 0 0.00 0.00 0.00 0.00 24 R 0 0.00 0.00 0.00 0.00 25 A 1000 0.00 0.00 0.00 0.06 26 A 100 0.00 0.13 0.00 4.78 27 A 100 0.59 21.69 0.08 98.81 28 A 100 0.00 0.06 0.00 1.07 29 R 0 0.00 0.00 0.00 0.00 30 R 0 0.00 100.00 0.00 2.84 31 A 100 0.00 0.00 0.00 0.57 32 R 0 0.00 0.00 0.00 0.00 33 A 100 0.00 0.00 0.00 0.57 34 R 0 0.00 0.00 0.00 0.00 35 A 100 0.00 0.03 0.00 1.55 36 A 100 0.00 0.01 0.00 0.57 37 A 100 0.00 0.04 0.01 3.67 38 R 0 0.00 0.00 0.00 0.00 39 A 100 0.00 0.00 0.00 1.49 40 A 100 0.00 0.01 0.00 2.49 41 R 0 0.00 0.00 0.00 0.00 42 R 0 0.00 0.00 0.00 0.00 43 R 0 0.00 0.00 0.00 0.00 44 R 0 0.00 0.00 0.00 0.00 45 R 0 0.00 0.00 0.00 0.00 46 R 0 0.00 0.00 0.00 0.00 47 R 0 0.00 0.00 0.00 0.00 48 R 0 0.00 0.00 0.00 0.00 49 R 0 0.00 0.00 0.00 0.00 50 R 0 0.00 0.00 0.00 0.00 51 R 0 0.00 0.00 0.00 0.00 52 A 1000 0.40 4.22 0.07 7.44 ================================================================================ > indicates Port Display Name truncated past 8 characters Link State: A-Active, R-Ready, NP-Port Not Present, L-Loopback
27번 포트를 먼저 disable 해 보았으나, 증상은 개선되지 않아서, 30번 포트를 disable로 변경.
30번 포트 disable 이후, 아래처럼 CPU 81.5% idle로 이상 증상 사라졌으며, 점유율 높았던 두 프로세스도 점유율이 낮아져서 보이지 않게 되었다.
Mem: 200696K used, 54320K free, 0K shrd, 0K buff, 79864K cached CPU: 2.9% usr 12.1% sys 0.0% nic 81.5% idle 0.0% io 2.6% irq 0.6% sirq Load average: 4.46 5.15 6.49 4/152 1562 PID PPID USER STAT RSS %MEM CPU %CPU COMMAND 1072 2 root SW< 0 0.0 0 3.9 [bcmLINK.1] 1070 2 root SW< 0 0.0 0 3.6 [bcmLINK.0] 1288 1 root S 3128 1.2 0 1.3 ./fdb 1282 1 root S 3992 1.5 0 0.9 ./aaa -t random 1290 1 root S 2120 0.8 0 0.9 ./elrp 1562 1561 root R 800 0.3 0 0.9 top -d 3 1526 2 root SW< 0 0.0 0 0.9 [bcmCNTR.1] 1525 2 root SW< 0 0.0 0 0.9 [bcmCNTR.0] 1240 1 root S < 10492 4.1 0 0.6 ./hal 1330 1 root S 3036 1.1 0 0.3 ./acl 1360 1 root R 2356 0.9 0 0.3 ./xmlc 946 945 root S 1768 0.6 0 0.3 /exos/bin/exsh 1242 1 root S 1620 0.6 0 0.3 ./nodemgr 1262 1 root S 17848 6.9 0 0.0 ./cliMaster 1333 1 root S 5880 2.3 0 0.0 ./netTools 1342 1 root S 5324 2.0 0 0.0 ./etmon 1269 1 root S 4872 1.9 0 0.0 ./snmpMaster 1358 1 root S 4688 1.8 0 0.0 ./xmld 1371 1 root S 4112 1.6 0 0.0 ./idMgr 1271 1 root S 4100 1.6 0 0.0 ./snmpSubagent 1320 1 root S 3856 1.5 0 0.0 ./rtmgr update 1286 1 root S 3604 1.4 0 0.0 ./vlan 1236 1 root S 3420 1.3 0 0.0 ./emsServer 1326 1 root S 3404 1.3 0 0.0 ./mcmgr 1264 1 root S 2852 1.1 0 0.0 ./cfgmgr
스위치 포트의 카운터를 초기화한 후, 포트 사용율 점검.
* Switch.4 # clear counters ports * Switch.5 # sh ports utilization bandwidth Port Link Link Rx Peak Rx Tx Peak Tx State Speed % bandwidth % bandwidth % bandwidth % bandwidth ================================================================================ 1 R 0 0.00 0.00 0.00 0.00 2 A 100 0.00 0.00 0.00 0.00 3 A 100 0.01 0.01 0.01 0.01 4 A 100 0.00 0.00 0.00 0.00 5 R 0 0.00 0.00 0.00 0.00 6 A 100 0.00 0.00 0.00 0.00 7 R 0 0.00 0.00 0.00 0.00 8 A 100 0.02 0.04 1.89 2.30 9 A 100 0.03 0.04 0.09 0.13 10 A 100 0.00 0.00 0.00 0.00 11 R 0 0.00 0.00 0.00 0.00 12 A 100 0.00 0.00 0.00 0.00 13 R 0 0.00 0.00 0.00 0.00 14 A 100 0.06 0.11 6.24 7.35 15 A 100 0.00 0.00 0.00 0.00 16 A 100 0.00 0.00 0.01 0.01 17 A 100 0.00 0.00 0.00 0.00 18 A 100 0.00 0.00 0.00 0.00 19 R 0 0.00 0.00 0.00 0.00 20 A 100 0.14 0.14 5.23 5.23 21 A 100 0.00 0.00 0.00 0.00 22 R 0 0.00 0.00 0.00 0.00 23 R 0 0.00 0.00 0.00 0.00 24 R 0 0.00 0.00 0.00 0.00 25 A 1000 0.00 0.00 0.00 0.00 26 A 100 0.00 0.00 0.00 0.00 27 A 100 0.04 0.14 3.78 8.48 28 A 100 0.00 0.00 0.00 0.00 29 R 0 0.00 0.00 0.00 0.00 30 R 0 0.00 0.00 0.00 0.00 31 A 100 0.00 0.00 0.00 0.00 32 R 0 0.00 0.00 0.00 0.00 33 A 100 0.00 0.00 0.00 0.00 34 R 0 0.00 0.00 0.00 0.00 35 A 100 0.00 0.00 0.00 0.00 36 A 100 0.00 0.00 0.00 0.00 37 A 100 0.33 0.47 4.81 27.71 38 R 0 0.00 0.00 0.00 0.00 39 A 100 0.00 0.00 0.00 0.00 40 A 100 0.00 0.00 0.00 0.01 41 R 0 0.00 0.00 0.00 0.00 42 R 0 0.00 0.00 0.00 0.00 43 R 0 0.00 0.00 0.00 0.00 44 R 0 0.00 0.00 0.00 0.00 45 R 0 0.00 0.00 0.00 0.00 46 R 0 0.00 0.00 0.00 0.00 47 R 0 0.00 0.00 0.00 0.00 48 R 0 0.00 0.00 0.00 0.00 49 R 0 0.00 0.00 0.00 0.00 50 R 0 0.00 0.00 0.00 0.00 51 R 0 0.00 0.00 0.00 0.00 52 A 1000 1.55 4.63 0.05 0.08 ================================================================================ > indicates Port Display Name truncated past 8 characters Link State: A-Active, R-Ready, NP-Port Not Present, L-Loopback * Switch.9 #