![screencapture of Viewer for SR latch at t=2.8ns. it shows two rings spaced horizontally, with arrows circulating them](readme_images/sr_latch_EzBxy_2800ps.png "SR latch at t=2.8ns")
the viewer shows us a single xy cross-section of the simulation at a moment in time.
it uses red-tipped arrows to show the x-y components of the B field at every point,
and the Z component of the E field is illustrated with color (bright green for positive polarization and dark blue for negative).
the light blue splotches depict the conductors (in the center, the wire coupling loops; on the edge, our energy-dissipating boundary).
what we see here is that both ferrites (the two large circles in the above image) have a clockwise polarized B field. this is in the middle of a transition, so the E fields look a bit chaotic. advance to t=46 ns: the "reset" pulse was applied at t=24ns and had 22ns to settle:
![screencapture of Viewer for SR latch at t=45.7ns. similar to above but with the B field polarized counter-clockwise](readme_images/sr_latch_EzBxy_45700ps.png "SR latch at t=45.7ns")
we can see the "reset" pulse has polarized both ferrites in the counter-clockwise orientation this time. the E field is less pronounced because we gave the system 22ns instead of 3ns to settle this time.
the graphical viewer is helpful for debugging geometries, but the CSV measurements are useful for viewing numeric system performance. peak inside "out/applications/sr-latch/meas.csv" to see a bunch of measurements over time. you can use a tool like Excel or [visidata](https://www.visidata.org/) to plot the interesting ones.
here's a plot of `M(mem2)` over time from the SR latch simulation. we're measuring, over the torus volume corresponding to the ferrite on the right in the images above, the (average) M component normal to each given cross section of the torus. the notable bumps correspond to these pulses: "set", "reset", "set", "reset", "set+reset applied simultaneously", "set", "set".
the processing loop for a simulation is roughly as follows ([`crates/coremem/src/driver.rs:step_until`](crates/coremem/src/driver.rs) drives this loop):
2. apply the FDTD update equations to "step" the E field, and then "step" the H field. these equations take the external stimulus from step 1 into account.
it turns out that the Courant rules force us to evaluate FDTD updates (step 2) on a _far_ smaller time scale than the other steps are sensitive to. so to tune for performance, we apply some optimizations:
- stimuli (step 1) are evaluated only once every N frames. we still *apply* them on each frame individually. the waveform resembles that of a Sample & Hold circuit.
although steps 1 and 3 vary heavily based on the user configuration of the simulation, step 2 can be defined pretty narrowly in code (no user-callbacks/dynamic function calls/etc). this lets us offload the processing of step 2 to a dedicated GPU. by tuning N/M/Z, step 2 becomes the dominant cost in our simulations and GPU offloading can easily boost performance by more than an order of magnitude on even a mid-range consumer GPU.
common stimuli type live in [crates/coremem/src/stim/](crates/coremem/src/stim/).
common measurements live in [crates/coremem/src/meas.rs](crates/coremem/src/meas.rs).
common render targets live in [crates/coremem/src/render.rs](crates/coremem/src/render.rs). these change infrequently enough that [crates/coremem/src/driver.rs](crates/coremem/src/driver.rs) has some specialized helpers for each render backend.
common materials are spread throughout [crates/cross/src/mat/](crates/cross/src/mat/).
different float implementations live in [crates/cross/src/real.rs](crates/cross/src/real.rs).
if you're getting NaNs, you can run the entire simulation on a checked `R64` (CPU-only) or `R32` (any backend) type in order to pinpoint the moment those are introduced.
- for `CpuBackend` simulations: just implement this trait on your own type and instantiate a `SpirvSim` specialized over that material instead of `GenericMaterial`.
as can be seen, the Material trait is fairly restrictive. its methods are immutable, and it doesn't even have access to the entire cell state (only the cell's M value, during `move_b_vec`). i'd be receptive to a PR or request that exposes more cell state or mutability: this is just an artifact of me tailoring this specifically to the class of materials i intended to use it for.
-`SerializerRenderer`: dumps the full 3d simulation state to disk. parseable after the fact with [crates/post/src/bin/viewer.rs](crates/post/src/bin/viewer.rs).
with my Radeon RX 5700XT, the sr\_latch example takes 125 minutes to complete 150ns of simulation time (3896500 simulation steps). that's on a grid of size 163x126x80 where the cell dimension is 20um.
in a FDTD simulation, as we shrink the cell size the time step has to shrink too (it's an inverse affine relationship). so the scale-invariant performance metric is "grid cell steps per second" (`(163*126*80)*3896500 / (125*60)`): we get 850M.
this is the "default" optimized version. you could introduce a new material to the simulation, and performance would remain constant. as you finalize your simulation, you can specialize it a bit and compile the GPU code to optimize for your specific material. this can squeeze another factor-of-2 gain: view [buffer\_proto5](crates/applications/buffer_proto5/src/main.rs) to see how that's done.
the author can be reached on Matrix <@colin:uninsane.org>, email <colin@uninsane.org> or Activity Pub <@colin@fed.uninsane.org>. i poured a lot of time into making
this whole library is really just an implementation of the Finite Difference Time Domain method with abstractions atop that to make it useful.
John B. Schneider has an extremely detailed beginner's guide for implementing FDTD from start to finish:
- [John B. Schneider: Understanding the Finite-Difference Time-Domain Method](https://eecs.wsu.edu/~schneidj/ufdtd/ufdtd.pdf)
Gregory Werner and John Cary provide a more rigorous approach and include error measurements which are useful for validation:
- Gregory Werner, John Cary: A stable FDTD algorithm for non-diagonal, anisotropic dielectrics (2007)
[MEEP](https://meep.readthedocs.io/) is another open-source FDTD project. it has quality docs which i used as a starting point in my research.
David Bennion and Hewitt Crane documented their approach for transforming Diode-Transistor Logic circuits into magnetic core circuits:
- David Bennion, Hewitt Crane: All-Magnetic Circuit Techniques (1964)
although i decided not to use PML, i found Steven Johnson's (of FFTW fame) notes to be the best explainer of PML:
- [Steven Johnson: Notes on Perfectly Matched Layers (PMLs)](https://math.mit.edu/~stevenj/18.369/spring07/pml.pdf)
a huge thanks to everyone above for sharing the fruits of their studies. though my work here is of a lesser caliber, i hope that someone, likewise, may someday find it of use.