The goal is to study the resistive limitation of the inverse MHD cascade both with and without net magnetic helicity. This was studied previously in the case without net magnetic helicity (see Brandenburg et al. 2024a,b) at resolutions up to 20483. It was found that the quantity tvA(t)/ξM(t) approaches a constant during the turbulent decay, provided the decay remains nearly self-similar. Surprisingly, the asymptotic value of tvA(t)/ξM(t) was found to increase with increasing Lundquist number. We found a similar behavior in 2-D as in 3-D, which may be suspect. In 2-D, we found that the Lundquist number dependence saturates near 104. To verify the Lundquist number dependence to examine the possibility of saturation in 3-D, we require a resolution of up to 81923 meshpoints.
A problem right now is that the GPU and CPU runs yield different results. The error might well be very trivial! Below the relevant test directories using 1283 meshpoints.
git clone -b gputestv6 --recurse-submodules https://AxelBrandenburg@pencil-code.org/git/ pencil-code cd pencil-code/src/astaroth/submodule git checkout PCinterface_2019-8-12
ml rocm ml cmake
pc_build -f compilers/Cray_MPI FFLAGS+=" -g -O0" LDFLAGS+='-Wl,--no-relax -L /opt/cray/pe/lib64 -lmpi_gtl_hsa'This is because the compiler has a bug, so we need the -O0 optimization. Later, we could use
pc_build -f compilers/Cray_MPI FFLAGS+=" -g" LDFLAGS+='-Wl,--no-relax'or
pc_build -f compilers/Cray_MPI LDFLAGS+='-Wl,--no-relax'Need the linker part.
module swap PrgEnv-cray PrgEnv-gnuTo compile, we must suppress the "FSTD_95=-std=f95" option. (But how?) Next, We would use
pc_build -f compilers/GNU-GCC_MPI FFLAGS+=" -g -mcmodel=large" LDFLAGS+='-L /opt/cray/pe/lib64 -lmpi_gtl_hsa'The plan is to work in:
/cfs/klemming/home/b/brandenb/data/GPU/axel/decay/reconnection/k60_nu5em6_k4_Pm5_128b_gnu
The figure pcomp_GPU.pdf above shows that Brms starts off with a value of around 0.003 and urms gets driven by the Lorentz force and quickly reaches comparable values. The GPU runs reproduce the CPU runs qualitatively, and the initial condition exactly.
The two runs now agree.
GPU: Wall clock time [hours] = 0.850 (+/- 8.3333E-12) Wall clock time/timestep/meshpoint [microsec] = 0.1620545 Maximum used memory per cpu [MBytes] = 1904.918 Maximum used memory [GBytes] = 14.003Compared to:
CPU: Wall clock time [hours] = 5.454E-02 (+/- 5.5556E-12) Wall clock time/timestep/meshpoint [microsec] = 1.0402362E-02 Maximum used memory per cpu [MBytes] = 43.348 Maximum used memory [GBytes] = 4.776
Wall clock time [hours] = 0.624 (+/- 5.5556E-12) Wall clock time/timestep/meshpoint [microsec] = 0.1190291 Maximum used memory per cpu [MBytes] = 1519.820 Maximum used memory [GBytes] = 11.492
The difference is caused by having used upwinding for the vector potential in the CPU version, which is not yet, however, coded in the GPU version; see
src/astaroth/DSL/magnetic/induction.hTo confirm this, we ran the CPU version without upwinding of the vector potential; see the dashed orange line in the spectra above.
brandenb@login1:/cfs/klemming/home/b/brandenb/data/GPU/axel/decay/reconnection/k60_nu2em5_k4_Pm5_128c_dt_GPU_lspec> head -20 ../k60_nu2em5_k4_Pm5_128c_dt_GPU/data/power_kin.dat 3.36215353875403758E-2 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.0323241484733396 1.05E-09 1.03E-06 8.25E-06 1.30E-05 2.44E-05 2.36E-05 2.42E-05 2.37E-05 2.17E-05 2.32E-05 1.87E-05 1.70E-05 1.63E-05 1.57E-05 1.45E-05 1.22E-05 1.23E-05 1.12E-05 1.01E-05 8.78E-06 8.73E-06 8.30E-06 7.53E-06 7.00E-06 6.58E-06 6.55E-06 6.02E-06 5.59E-06 5.44E-06 5.10E-06 4.90E-06 4.68E-06 4.29E-06 4.34E-06 4.09E-06 3.88E-06 3.80E-06 3.55E-06 3.51E-06 3.26E-06 3.22E-06 3.17E-06 2.97E-06 2.90E-06 2.80E-06 2.79E-06 2.63E-06 2.57E-06 2.49E-06 2.45E-06 2.41E-06 2.32E-06 2.30E-06 2.25E-06 2.18E-06 2.14E-06 2.11E-06 2.12E-06 2.05E-06 2.01E-06 2.02E-06 1.99E-06 1.97E-06 1.89E-06 2.0274802674034622 5.33E-09 1.41E-06 8.09E-06 1.06E-05 1.71E-05 1.69E-05 1.38E-05 1.30E-05 brandenb@login1:/cfs/klemming/home/b/brandenb/data/GPU/axel/decay/reconnection/k60_nu2em5_k4_Pm5_128c_dt_GPU_lspec> head -20 ../k60_nu2em5_k4_Pm5_128c_dt/data/power_kin.dat 0.0000000000000000 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.3946539533068810E-003 1.87E-14 3.46E-10 3.50E-09 1.13E-08 3.50E-08 8.70E-08 1.42E-07 2.20E-07 3.32E-07 5.47E-07 6.85E-07 7.50E-07 9.60E-07 1.11E-06 1.23E-06 1.21E-06 1.47E-06 1.55E-06 1.62E-06 1.56E-06 1.74E-06 1.83E-06 1.80E-06 1.83E-06 1.88E-06 2.00E-06 2.00E-06 2.01E-06 2.01E-06 2.01E-06 2.03E-06 2.09E-06 2.02E-06 2.13E-06 2.15E-06 2.12E-06 2.14E-06 2.12E-06 2.13E-06 2.05E-06 2.12E-06 2.12E-06 2.07E-06 2.10E-06 2.04E-06 2.08E-06 2.02E-06 1.99E-06 2.01E-06 2.01E-06 1.98E-06 1.91E-06 1.96E-06 1.92E-06 1.90E-06 1.87E-06 1.86E-06 1.86E-06 1.82E-06 1.80E-06 1.81E-06 1.78E-06 1.79E-06 1.74E-06 1.0028751054320695 4.45E-09 1.78E-06 1.84E-05 1.95E-05 4.41E-05 4.40E-05 3.87E-05 3.47E-05
Presentation by Matthias Rheinhardt (Aalto University) at the Pencil Code User Meeting 2024 in Barcelona about the GPU acceleration in the Pencil Code using Astaroth: Introduction to PC-A [pptx] (25 Sep 2024)
Brandenburg, A., Neronov, A., & Vazza, F.: 2024a, ``Resistively controlled primordial magnetic turbulence decay,'' Astron. Astrophys., in press (arXiv:2401.08569, ADS, HTML, PDF)
Brandenburg, A., Neronov, A., & Vazza, F.: 2024b, Datasets for ``Resistively controlled primordial magnetic turbulence decay'' v2024.01.18. Zenodo, DOI:10.5281/zenodo.10527437 (HTML, DOI)
brandenb@login1:/cfs/klemming/home/b/brandenb/data/GPU/axel/decay/reconnection/k60_nu5em6_k4_Pm5_128a> ls -l src/astaroth/submodule/acc-runtime/ total 8 drwxr-sr-x 2 brandenb pg_snic2020-4-12 4096 dec 30 02:03 built-in drwxr-sr-x 5 brandenb pg_snic2020-4-12 4096 dec 30 02:03 sampleswhile in the old directory we have
brandenb@login1:/cfs/klemming/home/b/brandenb/data/GPU/axel/decay/reconnection/k600_nu5em7_k4_Pm5_128a> ls -l src/astaroth/submodule/acc-runtime/ total 24 lrwxrwxrwx 1 brandenb pg_snic2020-4-12 89 dec 28 12:15 acc -> /cfs/klemming/home/b/brandenb/data/GPU/pencil-code/src/astaroth/submodule/acc-runtime/acc lrwxrwxrwx 1 brandenb pg_snic2020-4-12 89 dec 28 12:15 api -> /cfs/klemming/home/b/brandenb/data/GPU/pencil-code/src/astaroth/submodule/acc-runtime/api drwxr-sr-x 2 brandenb pg_snic2020-4-12 4096 dec 28 12:15 built-in lrwxrwxrwx 1 brandenb pg_snic2020-4-12 100 dec 28 12:15 CMakeLists.txt -> /cfs/klemming/home/b/brandenb/data/GPU/pencil-code/src/astaroth/submodule/acc-runtime/CMakeLists.txt drwxr-sr-x 2 brandenb pg_snic2020-4-12 4096 dec 28 12:15 dynamic drwxr-sr-x 5 brandenb pg_snic2020-4-12 4096 dec 28 11:23 samples