[测试] 3950X 2080Ti 9980XE 150k级实验室运算机

楼主: fo40225   2020-05-08 18:58:59
原本去年的经费存著是要在9月的时候买3950x
结果AMD跳票 3900x也缺货
只好买一颗6万的9980xe+一些9900k
(结果11月才有3950x 要买的话采购流程也赶不及年底关帐)
现在就来测测看今年是该继续买两万五的3950x还是三万五的10980XE
短时间之内intel应该也挤不出什么来
这篇的时效应该能维持一阵子
测试软件细节可参考 #1UjJiMol (PC_Shopping)
===
测试硬件
AMD Ryzen 9 3950X
Thermalright Silver Arrow IB-E Extreme
ASUS Pro WS X570-ACE
4x Kingston KVR32N22D8/32
2x GIGABYTE RTX2080Ti TURBO 11G (rev. 2.0)
MSI NVLink GPU Bridge 3-Slots
XPG SX8200Pro 1TB
全汉 CANNON 2000W
全汉 CMT230 炫战士
(机壳两个前风扇有上移
从原本两个风扇吹硬盘电源仓与显卡
改成下面那个吹显卡 上面对准m.2)
(收到货才发现技嘉显卡是rev. 2.0
1.0跟2.0的差异在电源接头位置
1.0在侧边 桌机来说较好安装
机壳不够宽电源线可能会卡到
2.0在后面 应该对机架式相容度较高
但机壳不够深也是很难装)
Intel Core i9-9980XE
Thermalright Silver Arrow IB-E Extreme
ASUS WS X299 PRO
8x A-DATA AD4U2666732G19-RGN
2x ASUS TURBO-RTX2080TI-11G
Quadro RTX 6000/8000 NVLink HB Bridge 2-Slot
ASUS HYPER M.2 X4 MINI CARD
└XPG SX8200Pro 1TB
全汉 CANNON 2000W
MSI MPG GUNGNIR 100
(这壳的背线空间没有很宽
前风扇风力没有很大)
BIOS版本与设定
ASUS Pro WS X570-ACE 1302
PBO manual
PPT 1000W
TDC 1000A
EDC 1000A
其余默认
DDR4-3200 (22-22-22) 1.2V
(我怀疑这版本的BIOS PBO是有问题的
测试成绩仅供参考
默认p95全核 sse2约3.8GHz avx2约3.4GHz
PBO Enable p95avx2瞬间黑画面
1000/1000/1000 sse2约3.8GHz avx2约3.8GHz
手调200~300A sse2约4.0GHz avx2约3.9GHz
Max CPU Boost Clock Override设200MHz会有一堆核心锁在500MHz)
ASUS WS X299 PRO 2002
Long Duration Package Power Limit 4095W
Package Power Time Window 127s
Short Duration Package Power Limit 4095W
CPU Integrated VR Current Limit 1023.875A
前上1风扇测点VRM
前下23风扇测点PCH
后风扇测点PCH
20度C 20% 65度C 70% 70度C 100%
其余默认
DDR4-2666 (19-19-19) 1.2V
另外使用
nvidia-smi -pm 1
nvidia-smi -pl 280
解除2080ti到280W
OS
Ubuntu Server 20.04 LTS kernel 5.4.0-26
CUDA driver 440.64
频率温度功耗
3950x
sensors读取温度
turbostat读取频率瓦数
9980xe
turbostat读取温度频率瓦数
2080ti
nvidia-smi读取温度频率瓦数
待机
3950x+2x2080ti
CPU 2200MHz 32度C 20W
GPU 300MHz 32度C 13W
延长线 95W
9980xe+2x2080ti
CPU 1200MHz 34度C 12W
GPU 300MHz 35度C 10W
延长线 95W
Prime95 Version 29.8 build 6
Small FFTs(L1/L2/L3)
3950x sse2
1秒
CPU 3826MHz 54.5度C 131W
延长线 227W
1分钟
CPU 3768MHz 62.5度C 125W
延长线 218W
https://youtu.be/kDgSxc9guZc
3950x fma3
1秒
CPU 3775MHz 60.3度C 156W
延长线 263W
1分钟
CPU 3753MHz 72.5度C 161W
延长线 271W
https://youtu.be/fZ3C3hk8TCk
9980xe sse2
1秒
CPU 3800MHz 66度C 257W
延长线 418W
1分钟
CPU 3800MHz 87度C 265W
延长线 430W
https://youtu.be/WZj_AQrFpME
9980xe fma3
1秒
CPU 3300MHz 61度C 241W
延长线 388W
1分钟
CPU 3300MHz 80度C 243W
延长线 395W
https://youtu.be/i1VyFFrVi0U
9980xe avx512
1秒
CPU 2800MHz 59度C 210W
延长线 344W
1分钟
CPU 2800MHz 74度C 208W
延长线 343W
https://youtu.be/vKs5G91rL7c
1xGPU tensorflow resnet50 training fp16 batch128
1x2080ti on 3950x
1秒
GPU 1830MHz 48度C 283W
延长线 416W
1分钟
GPU 1815MHz 68度C 277W
延长线 369W
https://youtu.be/T2XE2HlIeLg
1x2080ti on 9980xe
1秒
GPU 1875MHz 52度C 274W
延长线 428W
1分钟
GPU 1815MHz 78度C 262W
延长线 388W
https://youtu.be/pGnW6Am8jaA
p95+2GPU tensorflow
3950x avx2 + 2x2080ti
延长线 796W
https://youtu.be/MzaYkBRSAX0
9980xe sse2 + 2x2080ti
延长线 946W
https://youtu.be/EAauv9QAHkQ
CPU理论效能测试
./2006-Core2 //使用SSE2 模拟 一般/普通/传统/上古遗迹 应用程式
./2013-Haswell //使用AVX/FMA3 模拟 高度最佳化的现代应用程式
./2017-SkylakePurley //使用AVX512 Intel的加分题
| 128-bit SSE2 | 256-bit AVX | 256-bit FMA3
| Multiply + Add | Multiply + Add | Fused Multiply Add
| 1T | nT | 1T | nT | 1T | nT
3950x| 44.928| 995.664 | 78.912 | 1552.99 | 123.072 | 1791.36
9980xe| 35.04 | 546.144 | 62.016 | 948.384| 123.264 | 1882.37
| 512-bit AVX512
| Fused Multiply Add
| 1T | nT
9980xe| 235.008| 3227.14
CPU计算效能测试
|Cholesky|Det |Dot |Fft |Inv |Lu |Qr |Svd
3950x pip | 511.02 | 639.38| 648.55|5.17|433.32|575.97|122.69| 7.22
3950x mkl | 585.48 | 624.04| 247.31|5.29|285.64|536.54|333.59|11.15
debug mkl | 561.61 | 519.77| 626.46|6.40|479.98|454.04|376.93|12.73
9980xe pip | 597.74 | 699.82| 766.01|3.91|483.11|573.80|160.14|11.59
9980xe mkl | 820.29 |1086.11|1355.97|3.74|712.80|749.14|366.21|14.17
IO测试
|3950x |9980xe
1MSeqQ8T1r|2784MB/s |2387MB/s
1MSeqQ8T1w|2867MB/s |2324MB/s
1MSeqQ1T1r|2779MB/s |2405MB/s
1MSeqQ1T1w|2834MB/s |2283MB/s
4kQ32T16r | 697MB/s(170k) | 655MB/s(160k)
4kQ32T16w |1498MB/s(366k) |1492MB/s(364k)
4kQ1T1r |79.7MB/s(19.5k)|65.9MB/s(16.1k)
4kQ1T1w | 234MB/s(57.1k)| 230MB/s(56.2k)
(这两颗SSD都是新的 且都接在CPU上
应该就是Intel漏洞的影响)
nvidia-smi topo -m
3950x
GPU0 GPU1 CPU Affinity
GPU0 X NV2 0-31
GPU1 NV2 X 0-31
9980xe
GPU0 GPU1 CPU Affinity
GPU0 X NV2 0-35
GPU1 NV2 X 0-35
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between
NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe
Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically
the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the
PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
nvidia-smi topo -mp
3950x
GPU0 GPU1 CPU Affinity
GPU0 X PHB 0-31
GPU1 PHB X 0-31
9980xe
GPU0 GPU1 CPU Affinity
GPU0 X SYS 0-35
GPU1 SYS X 0-35
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between
NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe
Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically
the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the
PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
p2pBandwidthLatencyTest
3950x
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 529.77 6.24
1 6.25 531.67
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 530.74 46.92
1 46.93 531.33
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.64 11.11
1 11.10 535.07
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.64 93.47
1 93.68 532.94
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.90 15.96
1 12.55 1.93
CPU 0 1
0 2.82 7.58
1 7.61 3.00
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.90 2.04
1 2.06 1.94
CPU 0 1
0 3.07 2.50
1 2.51 3.06
9980xe
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 528.38 11.23
1 11.24 531.12
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 530.90 46.94
1 46.97 531.39
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 535.18 20.01
1 20.07 534.61
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.96 93.68
1 93.53 532.69
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.88 15.22
1 13.27 1.83
CPU 0 1
0 2.59 6.90
1 6.93 2.51
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.88 1.77
1 1.75 1.84
CPU 0 1
0 2.73 1.93
1 1.92 2.56
Tensorflow测试 resnet50
1x2080Ti
|fp32batch64|fp32batch128|fp16batch64|fp16batch128|fp16batch256
3950x | 266.86 | 240.54 | 669.76 | 683.40 | 566.51
9980xe | 269.42 | 264.58 | 672.30 | 685.81 | 640.76
2x2080ti fp32
| batch32 | batch64 | batch128
| global64 | global128 | global256
3950x | 540.22 | 592.19 | 387.67
9980xe | 541.25 | 597.76 | 486.02
2x2080ti fp16
| batch32 | batch64 | batch128 | batch256
| global64 | global128 | global256 | global512
3950x | 1103.82 | 1333.90 | 1479.71 | 1180.18
9980xe | 1078.67 | 1288.72 | 1400.09 | 1333.67
Pytorch 与 AMP(Apex) 测试
bert | fp32| fp16|
3950x 2x2080ti |00:26.38|00:26.22|
9980xe 2x2080ti |00:29.67|00:34.92|
===
看来这个价位(100k~200K) 若经费充足
需要CPU多核数学效能或大容量RAM该买10980xe
四通道内存 avx512两倍输出 MKL最佳化 不是开玩笑的
RAM大一倍(256GB vs 128GB)
主机板用ASUS WS X299 PRO/SE还可以有内建显示+IPMI
如果经费不足 购买3900x应该较合理
要双GPU主机纯做DL的话
3600x配x8/x8板+2张二手1080ti应该是最高CP值组合
作者: windrain0317 (你在大声啥)   2020-05-08 19:52:00
这篇再补推

Links booklink

Contact Us: admin [ a t ] ucptt.com