From 641dd9a5a19c27f55fb92b497c82c59e7a0b40c1 Mon Sep 17 00:00:00 2001 From: Colin Date: Wed, 15 Jan 2025 01:17:05 +0000 Subject: [PATCH] README: fix typos, update code paths --- README.md | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index 9220ddb..b5276d7 100644 --- a/README.md +++ b/README.md @@ -80,7 +80,7 @@ presently, this requires *specific* versions of rust-nightly to work. the feature is toggled at runtime, but compiled unconditionally. set up the toolchain according to [rust-toolchain.toml](rust-toolchain.toml): ``` -$ rustup default nightly-2022-08-29 +$ rustup default nightly-2023-01-21 $ rustup component add rust-src rustc-dev llvm-tools-preview ``` @@ -93,7 +93,7 @@ now you can swap out the `CpuDriver` with a `SpirvDriver` and you're set: re-run it as before and you should see the same results: ``` -$ cargo run --release --example wavefront +$ cargo run --release --bin wavefront ``` see the "Processing Loop" section below to understand what GPU acceleration entails. @@ -102,10 +102,10 @@ see the "Processing Loop" section below to understand what GPU acceleration enta the [sr\_latch](crates/applications/sr_latch/src/main.rs) example explores a more interesting feature set. first, it "measures" a bunch of parameters over different regions of the simulation -(peak inside [`crates/coremem/src/meas.rs`](crates/coremem/src/meas.rs) to see how these each work): +(peek inside [`meas.rs`](crates/coremem/src/meas.rs) to see how these each work): ```rust -// measure a bunch of items of interest throughout the whole simulation duration: +// measure some items of interest throughout the whole simulation duration: driver.add_measurement(meas::CurrentLoop::new("coupling", coupling_region.clone())); driver.add_measurement(meas::Current::new("coupling", coupling_region.clone())); driver.add_measurement(meas::CurrentLoop::new("sense", sense_region.clone())); @@ -140,7 +140,7 @@ driver.add_serializer_renderer(&*format!("{}frame-", prefix), 36000, None); run this, after having setup the GPU pre-requisites: ``` -$ cargo run --release --example sr_latch +$ cargo run --release --bin sr_latch ``` and then investigate the results with @@ -161,7 +161,7 @@ what we see here is that both ferrites (the two large circles in the above image we can see the "reset" pulse has polarized both ferrites in the counter-clockwise orientation this time. the E field is less pronounced because we gave the system 22ns instead of 3ns to settle this time. -the graphical viewer is helpful for debugging geometries, but the CSV measurements are useful for viewing numeric system performance. peak inside "out/applications/sr-latch/meas.csv" to see a bunch of measurements over time. you can use a tool like Excel or [visidata](https://www.visidata.org/) to plot the interesting ones. +the graphical viewer is helpful for debugging geometries, but the CSV measurements are useful for viewing numeric system performance. peek inside "out/applications/sr-latch/meas.csv" to see a bunch of measurements over time. you can use a tool like Excel or [visidata](https://www.visidata.org/) to plot the interesting ones. here's a plot of `M(mem2)` over time from the SR latch simulation. we're measuring, over the torus volume corresponding to the ferrite on the right in the images above, the (average) M component normal to each given cross section of the torus. the notable bumps correspond to these pulses: "set", "reset", "set", "reset", "set+reset applied simultaneously", "set", "set". @@ -171,14 +171,14 @@ here's a plot of `M(mem2)` over time from the SR latch simulation. we're measuri ## Processing Loop (and how GPU acceleration works) -the processing loop for a simulation is roughly as follows ([`crates/coremem/src/driver.rs:step_until`](crates/coremem/src/driver.rs) drives this loop): -1. evaluate all stimuli at the present moment in time; these produce an "externally applied" E and H field - across the entire volume. +the processing loop for a simulation is roughly as follows ([`driver.rs:step_until`](crates/coremem/src/driver.rs) drives this loop): +1. evaluate all stimuli at the present moment in time; + these produce an "externally applied" E and H field across the entire volume. 2. apply the FDTD update equations to "step" the E field, and then "step" the H field. these equations take the external stimulus from step 1 into account. 3. evaluate all the measurement functions over the current state; write these to disk. 4. serialize the current state to disk so that we can resume from this point later if we choose. -within each step above, the logic is multi-threaded and the rendeveous points lie at the step boundaries. +within each step above, the logic is multi-threaded and the rendezvous points lie at the step boundaries. it turns out that the Courant rules force us to evaluate FDTD updates (step 2) on a _far_ smaller time scale than the other steps are sensitive to. so to tune for performance, we apply some optimizations: - stimuli (step 1) are evaluated only once every N frames. we still *apply* them on each frame individually. the waveform resembles that of a Sample & Hold circuit. @@ -202,12 +202,12 @@ this library takes effort to separate the following from the core/math-heavy "si the simulation only interacts with these things through a trait interface, such that they're each swappable. -common stimuli type live in [crates/coremem/src/stim/](crates/coremem/src/stim/). -common measurements live in [crates/coremem/src/meas.rs](crates/coremem/src/meas.rs). -common render targets live in [crates/coremem/src/render.rs](crates/coremem/src/render.rs). these change infrequently enough that [crates/coremem/src/driver.rs](crates/coremem/src/driver.rs) has some specialized helpers for each render backend. -common materials are spread throughout [crates/cross/src/mat/](crates/cross/src/mat/). -different float implementations live in [crates/cross/src/real.rs](crates/cross/src/real.rs). -if you're getting NaNs, you can run the entire simulation on a checked `R64` (CPU-only) or `R32` (any backend) type in order to pinpoint the moment those are introduced. +common stimuli type live in [stim/mod.rs](crates/coremem/src/stim/mod.rs). +common measurements live in [meas.rs](crates/coremem/src/meas.rs). +common render targets live in [render.rs](crates/coremem/src/render.rs). these change infrequently enough that [driver.rs](crates/coremem/src/driver.rs) has some specialized helpers for each render backend. +common materials are spread throughout [mat/mod.rs](crates/cross/src/mat/mod.rs). +different float implementations live in [real.rs](crates/cross/src/real.rs). +if you're getting NaNs, you can run the entire simulation on a checked `R64` type in order to pinpoint the moment those are introduced. ## Materials @@ -237,17 +237,17 @@ this library ships with the following materials: - `MHPgram` specifies the `M(H)` function as a parallelogram. - `MBPgram` specifies the `M(B)` function as a parallelogram. -measurements include ([crates/coremem/src/meas.rs](crates/coremem/src/meas.rs)): +measurements include ([meas.rs](crates/coremem/src/meas.rs)): - E, B or H field (mean vector over some region) - energy, power (net over some region) - current (mean vector over some region) - mean current magnitude along a closed loop (toroidal loops only) - mean magnetic polarization magnitude along a closed loop (toroidal loops only) -output targets include ([crates/coremem/src/render.rs](crates/coremem/src/render.rs)): +output targets include ([render.rs](crates/coremem/src/render.rs)): - `ColorTermRenderer`: renders 2d-slices in real-time to the terminal. - `Y4MRenderer`: outputs 2d-slices to an uncompressed `y4m` video file. -- `SerializerRenderer`: dumps the full 3d simulation state to disk. parseable after the fact with [crates/post/src/bin/viewer.rs](crates/post/src/bin/viewer.rs). +- `SerializerRenderer`: dumps the full 3d simulation state to disk. parseable after the fact with [viewer.rs](crates/post/src/bin/viewer.rs). - `CsvRenderer`: dumps the output of all measurements into a `csv` file. historically there was also a plotly renderer, but that effort was redirected into developing the viewer tool better. @@ -266,8 +266,8 @@ contrast that to the CPU-only implementation which achieves 24.6M grid cell step # Support -the author can be reached on Matrix <@colin:uninsane.org>, email or Activity Pub <@colin@fed.uninsane.org>. i poured a lot of time into making -this: i'm happy to spend the marginal extra time to help curious people make use of what i've made, so don't hesitate to reach out. +the author can be reached on Matrix <@colin:uninsane.org>, email or Activity Pub <@colin@fed.uninsane.org>. +i'd love for this project to be useful to people besides just myself, so don't hesitate to reach out. ## Additional Resources @@ -288,4 +288,5 @@ David Bennion and Hewitt Crane documented their approach for transforming Diode- although i decided not to use PML, i found Steven Johnson's (of FFTW fame) notes to be the best explainer of PML: - [Steven Johnson: Notes on Perfectly Matched Layers (PMLs)](https://math.mit.edu/~stevenj/18.369/spring07/pml.pdf) -a huge thanks to everyone above for sharing the fruits of their studies. though my work here is of a lesser caliber, i hope that someone, likewise, may someday find it of use. +a huge thanks to everyone above for sharing the fruits of their studies. +this project would not have happened if not for literature like the above from which to draw.