297 lines
17 KiB
Markdown
297 lines
17 KiB
Markdown
# Finite Difference Time Domain Electromagnetics Simulation
|
|
|
|
this repository uses the [FDTD method](https://en.wikipedia.org/wiki/Finite-difference_time-domain_method)
|
|
to model the evolution of some 3d (or 2d) grid-volume of space over time. simulations are seeded with:
|
|
|
|
- some material at each position in the grid
|
|
- a set of stimuli to apply at specific regions in the volume over time
|
|
- a set of "measurements" to evaluate and record as the simulation evolves.
|
|
- an optional state file to allow pausing/resumption of long-run simulations
|
|
|
|
after this the simulation is advanced in steps up to some user-specified moment in time.
|
|
measurements are streamed to disk as the simulation runs, so the user can analyze those
|
|
as the simulation progresses or completes.
|
|
|
|
## Example Usage
|
|
|
|
simulations are defined by creating binary crate which links against the `coremem` library.
|
|
examples are in the [crates/applications/](crates/applications/) directory.
|
|
here's an excerpt from the [wavefront](crates/applications/wavefront/src/main.rs) example:
|
|
|
|
```rust
|
|
// Create the simulation "driver" which uses the CPU as backend.
|
|
let mut driver: driver::CpuDriver = driver::Driver::new(size, feature_size);
|
|
|
|
// create a conductor on the left side.
|
|
let conductor = Cube::new(
|
|
Index::new(0, 0, 0).to_meters(feature_size),
|
|
Index::new(width/10, height, 1).to_meters(feature_size),
|
|
);
|
|
driver.fill_region(&conductor, mat::IsomorphicConductor::new(200f32));
|
|
|
|
// create a vertical strip in the center of the simulation which emits a wave.
|
|
let center_region = Cube::new(
|
|
Index::new(200, height/4, 0).to_meters(feature_size),
|
|
Index::new(201, height*3/4, 1).to_meters(feature_size),
|
|
);
|
|
// emit a constant E/H delta over this region for 100 femtoseconds
|
|
let stim = Stimulus::new(
|
|
center_region,
|
|
UniformStimulus::new(
|
|
Vec3::new(2e19, 0.0, 0.0), // E field (per second)
|
|
Vec3::new(0.0, 0.0, 2e19/376.730) // H field (per second)
|
|
).gated(0.0, 100e-15),
|
|
);
|
|
driver.add_stimulus(stim);
|
|
|
|
// finally, run the simulation:
|
|
driver.step_until(Seconds(100e-12));
|
|
```
|
|
|
|
you can run the full example with:
|
|
```
|
|
$ cargo run --release --bin wavefront
|
|
```
|
|
|
|
graphical simulation output is displayed directly to the terminal, and also rendered (in higher resolution) to
|
|
a video file in `out/applications/wavefront/`.
|
|
|
|
this example runs on the CPU. the moment you do anything more complex than this, you'll want to leverage the GPU,
|
|
which can easily be 30-100x faster:
|
|
|
|
|
|
## GPU Acceleration
|
|
|
|
we use rust-gpu for gpu acceleration. presently, this requires *specific* versions of rust-nightly to work.
|
|
the feature is toggled at runtime, but compiled unconditionally. set up the toolchain according to [rust-toolchain.toml](rust-toolchain.toml):
|
|
|
|
```
|
|
$ rustup default nightly-2022-04-11
|
|
$ rustup component add rust-src rustc-dev llvm-tools-preview
|
|
```
|
|
|
|
(it's possible to work with older nightlies like `nightly-2022-01-13` or `nightly-2021-06-08` if you enable the 2020 feature and downgrade whichever packages rustc complains about.)
|
|
|
|
now you can swap out the `CpuDriver` with a `SpirvDriver` and you're set:
|
|
```diff
|
|
- let mut driver: driver::CpuDriver = driver::Driver::new(size, feature_size);
|
|
+ let mut driver: driver::SpirvDriver = driver::Driver::new_spirv(size, feature_size);
|
|
```
|
|
|
|
re-run it as before and you should see the same results:
|
|
|
|
```
|
|
$ cargo run --release --example wavefront
|
|
```
|
|
|
|
see the "Processing Loop" section below to understand what GPU acceleration entails.
|
|
|
|
## Advanced Usage: Viewers, Measurements
|
|
|
|
the [sr\_latch](crates/applications/sr_latch/src/main.rs) example explores a more interesting feature set.
|
|
first, it "measures" a bunch of parameters over different regions of the simulation
|
|
(peak inside [`src/meas.rs`](crates/coremem/src/meas.rs) to see how these each work):
|
|
|
|
```rust
|
|
// measure a bunch of items of interest throughout the whole simulation duration:
|
|
driver.add_measurement(meas::CurrentLoop::new("coupling", coupling_region.clone()));
|
|
driver.add_measurement(meas::Current::new("coupling", coupling_region.clone()));
|
|
driver.add_measurement(meas::CurrentLoop::new("sense", sense_region.clone()));
|
|
driver.add_measurement(meas::Current::new("sense", sense_region.clone()));
|
|
driver.add_measurement(meas::MagneticLoop::new("mem1", ferro1_region.clone()));
|
|
driver.add_measurement(meas::Magnetization::new("mem1", ferro1_region.clone()));
|
|
driver.add_measurement(meas::MagneticFlux::new("mem1", ferro1_region.clone()));
|
|
driver.add_measurement(meas::MagneticLoop::new("mem2", ferro2_region.clone()));
|
|
driver.add_measurement(meas::CurrentLoop::new("set", set_region.clone()));
|
|
driver.add_measurement(meas::Current::new("set", set_region.clone()));
|
|
driver.add_measurement(meas::Power::new("set", set_region.clone()));
|
|
driver.add_measurement(meas::CurrentLoop::new("reset", reset_region.clone()));
|
|
```
|
|
|
|
during the simulation, it dumps these measurements into a CSV:
|
|
|
|
```rust
|
|
// render a couple CSV files: one very detailed and the other more sparsely detailed
|
|
driver.add_csv_renderer(&*format!("{}meas.csv", prefix), 200, None);
|
|
driver.add_csv_renderer(&*format!("{}meas-sparse.csv", prefix), 1600, None);
|
|
```
|
|
|
|
furthermore, it serializes point-in-time snapshots of the simulation while it's running,
|
|
allowing you to dig further into the simulation in an _interactive_ way (versus the raw video
|
|
renderer used in the `wavefront` example):
|
|
|
|
```rust
|
|
// serialize frames for later viewing with `cargo run -p coremem_post --release --bin viewer`
|
|
driver.add_serializer_renderer(&*format!("{}frame-", prefix), 36000, None);
|
|
```
|
|
|
|
run this, after having setup the GPU pre-requisites:
|
|
|
|
```
|
|
$ cargo run --release --example sr_latch
|
|
```
|
|
|
|
and then investigate the results with
|
|
|
|
```
|
|
$ cargo run -p coremem_post --bin viewer ./out/applications/sr_latch
|
|
```
|
|

|
|
|
|
the viewer shows us a single xy cross-section of the simulation at a moment in time.
|
|
it uses red-tipped arrows to show the x-y components of the B field at every point,
|
|
and the Z component of the E field is illustrated with color (bright green for positive polarization and dark blue for negative).
|
|
the light blue splotches depict the conductors (in the center, the wire coupling loops; on the edge, our energy-dissipating boundary).
|
|
|
|
what we see here is that both ferrites (the two large circles in the above image) have a clockwise polarized B field. this is in the middle of a transition, so the E fields look a bit chaotic. advance to t=46 ns: the "reset" pulse was applied at t=24ns and had 22ns to settle:
|
|
|
|

|
|
|
|
we can see the "reset" pulse has polarized both ferrites in the CCW orientation this time. the E field is less pronounced because we gave the system 22ns instead of 3ns to settle this time.
|
|
|
|
the graphical viewer is helpful for debugging geometries, but the CSV measurements are useful for viewing numeric system performance. peak inside "out/applications/sr-latch/meas.csv" to see a bunch of measurements over time. you can use a tool like Excel or [visidata](https://www.visidata.org/) to plot the interesting ones.
|
|
|
|
here's a plot of `M(mem2)` over time from the SR latch simulation. we're measuring the (average) M value along the major tangent to the torus corresponding to the ferrite on the right in the images above. the notable bumps correspond to these pulses: "set", "reset", "set", "reset", "set+reset applied simultaneously", "set", "set".
|
|
|
|
 over time")
|
|
|
|
|
|
## Processing Loop (and how GPU acceleration works)
|
|
|
|
the processing loop for a simulation is roughly as follows ([`src/driver.rs:step_until`](crates/coremem/src/driver.rs) drives this loop):
|
|
1. evaluate all stimuli at the present moment in time; these produce an "externally applied" E and H field
|
|
across the entire volume.
|
|
2. apply the FDTD update equations to "step" the E field, and then "step" the H field. these equations take the external stimulus from step 1 into account.
|
|
3. evaluate all the measurement functions over the current state; write these to disk.
|
|
4. serialize the current state to disk so that we can resume from this point later if we choose.
|
|
|
|
within each step above, the logic is multi-threaded and the rendeveous points lie at the step boundaries.
|
|
|
|
it turns out that the Courant rules force us to evaluate FDTD updates (step 2) on a _far_ smaller time scale than the other steps are sensitive to. so to tune for performance, we apply some optimizations here universally:
|
|
- stimuli (step 1) are evaluated only once every N frames (tunable). we still *apply* them on each frame individually. the waveform resembles that of a Sample & Hold circuit.
|
|
- measurement functions (step 3) are triggered only once every M frames.
|
|
- the state is serialized (step 4) only once every Z frames.
|
|
|
|
as a result, step 2 is actually able to apply the FDTD update functions not just once but up to `min(N, M, Z)` times.
|
|
although steps 1 and 3 vary heavily based on the user configuration of the simulation, step 2 can be defined pretty narrowly in code (no user-callbacks/dynamic function calls/etc). this lets us offload the processing of step 2 to a dedicated GPU. by tuning N/M/Z, step 2 becomes the dominant cost in our simulations an GPU offloading can trivially boost performance by more than an order of magnitude on even a mid-range consumer GPU.
|
|
|
|
# Features
|
|
|
|
this library takes effort to separate the following from the core/math-heavy "simulation" logic:
|
|
|
|
- Stimuli
|
|
- Measurements
|
|
- Render targets (video, CSV, etc)
|
|
- Materials (conductors, non-linear ferromagnets)
|
|
- Float implementation (for CPU simulations only)
|
|
|
|
the simulation only interacts with these things through a trait interface, such that they're each swappable.
|
|
|
|
common stimuli type live in [src/stim.rs](crates/coremem/src/stim.rs).
|
|
common measurements live in [src/meas.rs](crates/coremem/src/meas.rs).
|
|
common render targets live in [src/render.rs](crates/coremem/src/render.rs). these change infrequently enough that [src/driver.rs](crates/coremem/src/driver.rs) has some specialized helpers for each render backend.
|
|
common materials are spread throughout [src/mat](crates/coremem/src/mat/mod.rs).
|
|
different float implementations live in [src/real.rs](crates/coremem/src/real.rs).
|
|
if you're getting NaNs, you can run the entire simulation on a checked `R64` type in order to pinpoint the moment those are introduced.
|
|
|
|
## Materials
|
|
|
|
of these, the materials have the most "gotchas".
|
|
each cell owns an associated material instance.
|
|
in the original CPU implementation of this library, each cell had a `E` and `H` component,
|
|
and any additional state was required to be held in the material. so a conductor material
|
|
might hold only some immutable `conductivity` parameter, while a ferromagnetic material
|
|
might hold similar immutable material parameters _and also a mutable `M` field_.
|
|
|
|
spirv/rust-gpu requires stronger separation of state, and so this `M` field had to be lifted
|
|
completely out of the material. as a result, the material API differs slightly between the CPU
|
|
and spirv backends. as you saw in the examples, that difference doesn't have to appear at the user
|
|
level, but you will see it if you're adding new materials.
|
|
|
|
### Spirv Materials
|
|
|
|
all the materials usable in the spirv backend live in [`crates/spirv_backend/src/mat.rs`](crates/spirv_backend/src/mat.rs).
|
|
to add a new one, implement the `Material` trait in that file on some new type, which must also
|
|
be in that file.
|
|
|
|
next, add an analog type somewhere in the main library, like [`src/mat/mh_ferromagnet.rs`](crates/coremem/src/mat/mh_ferromagnet.rs). this will
|
|
be the user-facing material.
|
|
now implement the `IntoFfi` and `IntoLib` traits for this new material inside [`src/sim/spirv/bindings.rs`](crates/coremem/src/sim/spirv/bindings.rs)
|
|
so that the spirv backend can translate between its GPU-side material and your CPU-side/user-facing material.
|
|
|
|
finally, because cpu-side `SpirvSim<M>` is parameterized over a material, but the underlying spirv library
|
|
is compiled separately, the spirv library needs specialized dispatch logic for each value of `M` you might want
|
|
to use. add this to [`crates/spirv_backend/src/lib.rs`](crates/spirv_backend/src/lib.rs) (it's about five lines: follow the example of `Iso3R1`).
|
|
|
|
|
|
### CPU Materials
|
|
|
|
adding a CPU material is "simpler". just implement the `Material` trait in [`src/mat/mod.rs`](crates/coremem/src/mat/mod.rs).
|
|
either link that material into the `GenericMaterial` type in the same file (if you want to easily
|
|
mix materials within the same simulation), or if that material can handle every cell in your
|
|
simulation then instantiance a `SimState<M>` object which is directly parameterized over your material.
|
|
|
|
|
|
## What's in the Box
|
|
|
|
this library ships with the following materials:
|
|
- conductors (Isomorphic or Anisomorphic). supports CPU or GPU.
|
|
- linear magnets (defined by their relative permeability, mu\_r). supports CPU only.
|
|
- a handful of ferromagnet implementations:
|
|
- `MHPgram` specifies the `M(H)` function as a parallelogram. supports CPU or GPU.
|
|
- `MBPgram` specifies the `M(B)` function as a parallelogram. supports CPU or GPU.
|
|
- `MHCurve` specifies the `M(H)` function as an arbitrary polygon. requires a new type for each curve for memory reasons (see `Ferroxcube3R1`). supports CPU only.
|
|
|
|
measurements include ([src/meas.rs](crates/coremem/src/meas.rs)):
|
|
- E, B or H field (mean vector over some region)
|
|
- energy, power (net over some region)
|
|
- current (mean vector over some region)
|
|
- mean current magnitude along a closed loop (toroidal loops only)
|
|
- mean magnetic polarization magnitude along a closed loop (toroidal loops only)
|
|
|
|
output targets include ([src/render.rs](crates/coremem/src/render.rs)):
|
|
- `ColorTermRenderer`: renders 2d-slices in real-time to the terminal.
|
|
- `Y4MRenderer`: outputs 2d-slices to an uncompressed `y4m` video file.
|
|
- `SerializerRenderer`: dumps the full 3d simulation state to disk. parseable after the fact with [src/bin/viewer.rs](crates/post/src/bin/viewer.rs).
|
|
- `CsvRenderer`: dumps the output of all measurements into a `csv` file.
|
|
|
|
historically there was also a plotly renderer, but that effort was redirected into developing the viewer tool better.
|
|
|
|
|
|
# Performance
|
|
|
|
with my Radeon RX 5700XT, the sr\_latch example takes 125 minutes to complete 150ns of simulation time (3896500 simulation steps). that's on a grid of size 163x126x80 where the cell dimension is 20um.
|
|
|
|
in a FDTD simulation, as we shrink the cell size the time step has to shrink too (it's an inverse affine relationship). so the scale-invariant performance metric is "grid cell steps per second" (`(163*126*80)*3896500 / (125*60)`): we get 850M.
|
|
|
|
this is the "default" optimized version. you could introduce a new material to the simulation, and performance would remain constant. as you finalize your simulation, you can specialize it a bit and compile the GPU code to optimize for your specific material. this can squeeze another factor-of-2 gain: view [buffer\_proto5](crates/applications/buffer_proto5/src/main.rs) to see how that's done.
|
|
|
|
contrast that to the CPU-only implementation which achieves 24.6M grid cell steps per second: that's about a 34x gain.
|
|
|
|
|
|
# Support
|
|
|
|
the author can be reached on Matrix <@colin:uninsane.org> or Activity Pub <@colin@fed.uninsane.org>. i poured a lot of time into making
|
|
this: i'm happy to spend the marginal extra time to help curious people make use of what i've made, so don't hesitate to reach out.
|
|
|
|
|
|
## Additional Resources
|
|
|
|
this whole library is really just an implementation of the Finite Difference Time Domain method with abstractions atop that to make it useful.
|
|
|
|
John B. Schneider has an extremely detailed beginner's guide for implementing FDTD from start to finish:
|
|
- [John B. Schneider: Understanding the Finite-Difference Time-Domain Method](https://eecs.wsu.edu/~schneidj/ufdtd/ufdtd.pdf)
|
|
|
|
Gregory Werner and John Cary provide a more rigorous approach and include error measurements which are useful for validation:
|
|
- Gregory Werner, John Cary: A stable FDTD algorithm for non-diagonal, anisotropic dielectrics (2007)
|
|
|
|
[MEEP](https://meep.readthedocs.io/) is another open-source FDTD project. it has quality docs which i used as a starting point in my research.
|
|
|
|
David Bennion and Hewitt Crane documented their approach for transforming Diode-Transistor Logic circuits into magnetic core circuits:
|
|
- David Bennion, Hewitt Crane: All-Magnetic Circuit Techniques (1964)
|
|
|
|
although i decided not to use PML, i found Steven Johnson's (of FFTW fame) notes to be the best explainer of PML:
|
|
- [Steven Johnson: Notes on Perfectly Matched Layers (PMLs)](https://math.mit.edu/~stevenj/18.369/spring07/pml.pdf)
|
|
|
|
a huge thanks to everyone above for sharing the fruits of their studies. though my work here is of a lesser caliber, i hope that someone, likewise, may someday find it of use.
|