No description
Find a file
2026-05-12 01:43:15 +00:00
docs @fastmath on f32 force loop: 7.267 -> 7.897 G/s (+8.7%) 2026-05-12 00:11:16 +00:00
src Pack tile sin/cos: 5->4 SHMEM slots, 41KB->33KB 2026-05-12 01:43:15 +00:00
src2 Fix: reinterpret -> Core.bitcast for packed sin/cos (GPU kernel compat) 2026-05-12 01:02:45 +00:00
test Tile BAOAB mixed precision: f32 force+BM, 2x speedup (0.49->0.97 G/s) 2026-05-12 01:32:36 +00:00
test2 Pack f32 sin+cos into one f64 SHMEM slot: 8.20 -> 8.53 G/s 2026-05-12 00:36:44 +00:00
test3 Add configurable rng_seed: UInt64 parameter to kernel and launch functions 2026-05-11 14:34:59 +00:00
AGENTS.md Mixed precision 10M benchmark: 212.2s, 8.107 G/s, <Ek>=0.1501 2026-05-12 01:16:01 +00:00
Manifest.toml BAOAB Langevin kernel + Option A noise pre-generation 2026-05-10 22:55:54 +00:00
Project.toml BAOAB Langevin kernel + Option A noise pre-generation 2026-05-10 22:55:54 +00:00