fdtd-coremem/README.md

9.7 KiB

Finite Difference Time Domain Electromagnetics Simulation

this repository uses the FDTD method to model the evolution of some 3d (or 2d) grid-volume of space over time. simulations are seeded with:

  • some material at each position in the grid
  • a set of stimuli to apply at specific regions in the volume over time
  • a set of "measurements" to evaluate and record as the simulation evolves.
  • an optional state file to allow pausing/resumption of long-run simulations

after this the simulation is advanced in steps up to some user-specified moment in time. measurements are streamed to disk as the simulation runs, so the user can analyze those as the simulation progresses or completes.

Example Usage

simulations are defined by creating either a binary crate which links against the coremem library, or simply inserting a top-level file to the examples/ directory in this repo and running it. here's an excerpt from the sr_latch example shipped in this library:

    // create some `Region` instances which will define our geometry
    let ferro1_region = Torus::new_xy(Meters::new(ferro1_center, ferro_center_y, half_depth), ferro_major, ferro_minor);
    let ferro2_region = Torus::new_xy(Meters::new(ferro2_center, ferro_center_y, half_depth), ferro_major, ferro_minor);
    let set_region = Torus::new_yz(Meters::new(ferro1_center, ferro_center_y - ferro_major, half_depth), wire_major, wire_minor);
    let reset_region = Torus::new_yz(Meters::new(ferro1_center, ferro_center_y + ferro_major, half_depth), wire_major, wire_minor);
    let coupling_region = Torus::new_xz(Meters::new(0.5*(ferro1_center + ferro2_center), ferro_center_y, half_depth), wire_coupling_major, wire_minor);

    // create the interface through which we'll "drive" a simulation to completion:
    let mut driver: SpirvDriver<spirv::FullyGenericMaterial> = Driver::new_spirv(Meters::new(width, height, depth), feat_size);

    // fill each region within the simulation with the appropriate material
    driver.fill_region(&ferro1_region, mat::MHPgram::new(25.0, 881.33, 44000.0));
    driver.fill_region(&ferro2_region, mat::MHPgram::new(25.0, 881.33, 44000.0));
    driver.fill_region(&set_region, mat::IsomorphicConductor::new(drive_conductivity));
    driver.fill_region(&reset_region, mat::IsomorphicConductor::new(drive_conductivity));
    driver.fill_region(&coupling_region, mat::IsomorphicConductor::new(drive_conductivity));

    // populate our stimuli.
    // here we pulse the E field with amplitude defined by a half-sine wave w.r.t. time.
    // we apply this to the "set" wire loop, everywhere directed tangential to the loop.
    let wave = Sinusoid1::from_wavelength(peak_set_field, set_pulse_duration * 2.0)
        .half_cycle()
        .shifted(start);
    driver.add_stimulus(CurlStimulus::new(
        set_region.clone(),
        wave,
        set_region.center(),
        set_region.axis()
    ));

    // every 36000 "steps", render the simulation state to disk.
    driver.add_serializer_renderer("out/examples/sr_latch/frame-", 36000, None);
    driver.step_until(Seconds(3.0*set_pulse_duration));

you can run the full example with:

$ cargo run --release --example sr_latch

TODO: switch between CPU and GPU accel in this demo.

and then investigate the results with

$ cargo run --bin viewer out/examples/sr_latch

screencapture of Viewer for SR latch at t=2.8ns. it shows two rings spaced horizontally, with arrows circulating them

the viewer shows us a single xy cross-section of the simulation at a moment in time. it uses red-tipped arrows to show the x-y components of the B field at every point, and the Z component of the E field is illustrated with color (bright green for positive polarization and dark blue for negative). the light blue splotches depict the conductors (in the center, the wire coupling loops; on the edge, our energy-dissipating boundary).

what we see here is that both ferrites (the two large circles in the above image) have a clockwise polarized B field. this is in the middle of a transition, so the E fields look a bit chaotic. advance to t=46 ns: the "reset" pulse was applied at t=24ns and had 22ns to settle:

screencapture of Viewer for SR latch at t=45.7ns. similar to above but with the B field polarized CCW

we can see the "reset" pulse has polarized both ferrites in the CCW orientation this time. the E field is less pronounced because we gave the system 22ns instead of 3ns to settle this time.

the graphical viewer is helpful for debugging geometries, but the CSV measurements are useful for viewing numeric system performance. peak inside "out/examples/sr-latch/meas.csv" to see a bunch of measurements over time. you can use a tool like Excel or visidata to plot the interesting ones.

here's a plot of M(mem2) over time from the SR latch simulation. we're measuring the (average) M value along the major tangent to the torus corresponding to the ferrite on the right in the images above. the notable bumps correspond to these pulses: "set", "reset", "set", "reset", "set+reset applied simultaneously", "reset", "reset".

plot of M(mem2) over time

Processing Loop

the processing loop for a simulation is roughly as follows (src/driver.rs:step_until drives this loop):

  1. evaluate all stimuli at the present moment in time; these produce an "externally applied" E and B field across the entire volume.
  2. apply the FDTD update equations to "step" the B field, and then "step" the E field (TODO: is this the right order?). these equations take the external stimulus from step 1 into account.
  3. evaluate all the measurement functions over the current state; write these to disk.
  4. serialize the current state to disk so that we can resume from this point later if we choose.

within each step above, the logic is multi-threaded and the rendeveous points lie at the step boundaries.

it turns out that the Courant rules force us to evaluate FDTD updates (step 2) on a far smaller time scale than the other steps are sensitive to. so to tune for performance, we apply some optimizations here universally:

  • stimuli (step 1) are evaluated only once every N frames (tunable). we still apply them on each frame individually. the waveform resembles that of a Sample & Hold circuit.
  • measurement functions (step 3) are triggered only once every M frames.
  • the state is serialized (step 4) only once every Z frames.

as a result, step 2 is actually able to apply the FDTD update functions not just once but up to min(N, M, Z) times. although steps 1 and 3 vary heavily based on the user configuration of the simulation, step 2 can be defined pretty narrowly in code (no user-callbacks/dynamic function calls/etc). this lets us offload the processing of step 2 to a dedicated GPU. by tuning N/M/Z, step 2 becomes the dominant cost in our simulations an GPU offloading can trivially boost performance by more than an order of magnitude on even a mid-range consumer GPU.

GPU Acceleration

we use rust-gpu for gpu acceleration. presently, this requires specific versions of rust-nightly to work:

$ rustup default nightly-2022-01-13
$ rustup component add rust-src rustc-dev llvm-tools-preview

(it's possible to work with older nightlies like nightly-2021-06-08 if you enable the 2020 feature.)

TODO: show how to enable gpu accel.

Features

TODO: document Material options, Stimulus options, Measurement options, Rendering options.

Performance

with my Radeon RX 5700XT, the sr_latch example takes 125 minutes to complete 150ns of simulation time (3896500 simulation steps). that's on a grid of size 163x126x80 where the cell dimension is 20um.

in a FDTD simulation, as we shrink the cell size the time step has to shrink too (it's an inverse affine relationship). so the scale-invariant performance metric is "grid cell steps per second" ((163*126*80)*3896500 / (125*60)): we get 850M.

this is the "default" optimized version. you could introduce a new material to the simulation, and performance would remain constant. as you finalize your simulation, you can specialize it a bit and compile the GPU code to optimize for your specific material. this can squeeze another factor-of-2 gain: view <examples/buffer_proto5.rs> to see how that's done.

Support

the author can be reached on Matrix <@colin:uninsane.org> or Activity Pub <@colin@fed.uninsane.org>. i poured a lot of time into making this: i'm happy to spend the marginal extra time to help curious people make use of what i've made, so don't hesitate to reach out.

Additional Resources

TODO: cite the works which were useful in getting this off the ground.

TODO: consult the licenses of my dependencies.

License

i'm not a lawyer, and i don't want to be. by nature of your reading this, my computer has freely shared these bits with yours. at this point, it's foolish to think i could do anything to restrict your actions with them, and even more foolish to believe that i have any sort of "right" to do so.

however, if you somehow believe IP laws are legitimate, then:

  • i claim whatever minimal copyright is necessary for my own use of this code (and future modifications made by me/shared to this repository) to continue unencumbered.
  • i license these works to you according to that same condition and the additional condition that your use of these works does not force me into any additional interactions with legal systems which i would not have made were these works not made available to you (e.g. your license to these works is conditional upon your not filing any lawsuits/patent claims/etc against me).