This website requires JavaScript.
Explore
Help
Sign in
taohonker
/
2dlattice-cuda
Watch
1
Star
0
Fork
You've already forked 2dlattice-cuda
0
Code
Issues
Pull requests
Projects
Releases
Packages
Wiki
Activity
Actions
No description
54
commits
7
branches
0
tags
230
KiB
Julia
100%
main
Find a file
HTTPS
Download ZIP
Download TAR.GZ
Download BUNDLE
Open with VS Code
Open with VSCodium
Open with Intellij IDEA
Exact
Exact
Union
RegExp
tao
d491d2bddc
Pack tile sin/cos: 5->4 SHMEM slots, 41KB->33KB
2026-05-12 01:43:15 +00:00
docs
@fastmath on f32 force loop: 7.267 -> 7.897 G/s (+8.7%)
2026-05-12 00:11:16 +00:00
src
Pack tile sin/cos: 5->4 SHMEM slots, 41KB->33KB
2026-05-12 01:43:15 +00:00
src2
Fix: reinterpret -> Core.bitcast for packed sin/cos (GPU kernel compat)
2026-05-12 01:02:45 +00:00
test
Tile BAOAB mixed precision: f32 force+BM, 2x speedup (0.49->0.97 G/s)
2026-05-12 01:32:36 +00:00
test2
Pack f32 sin+cos into one f64 SHMEM slot: 8.20 -> 8.53 G/s
2026-05-12 00:36:44 +00:00
test3
Add configurable rng_seed: UInt64 parameter to kernel and launch functions
2026-05-11 14:34:59 +00:00
AGENTS.md
Mixed precision 10M benchmark: 212.2s, 8.107 G/s, <Ek>=0.1501
2026-05-12 01:16:01 +00:00
Manifest.toml
BAOAB Langevin kernel + Option A noise pre-generation
2026-05-10 22:55:54 +00:00
Project.toml
BAOAB Langevin kernel + Option A noise pre-generation
2026-05-10 22:55:54 +00:00