-
Notifications
You must be signed in to change notification settings - Fork 55
fix[autograd]: sample fields along slab height and polygon edges #2418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
f37734b
to
00b99de
Compare
d73a485
to
d188b31
Compare
looks like some tests are failing due to the previously unhandled edge cases. have to double check to be sure though:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yaugenst-flex. This is pretty complex, I think it will require a lot of testing for both performance and gradient accuracy. Probably in all of the notebooks. I'd say maybe best left to 2.9 to be safe. how much testing in notebooks have you done already? either way will be great to have more accurate gradients for PolySlab!
We test ``PolySlab.compute_derivatives`` by feeding it a dummy ``DerivativeInfo`` | ||
whose ``grad_surfaces()`` method returns the analytically chosen surface integrand | ||
|
||
k(x, y) = ax + by + c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is smart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we test z / slab_bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the integrand depends only on the in-plane coordinates so the derivative w.r.t. slab_bounds
is just the surface integral over k. we test the slab_bounds
for the three edge cases of fully inside, partially inside, and fully outside faces, where the face that lies outside of the simulation (ie the slab_bounds
for that index) get a derivative of zero for that side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you mean testing the integration along z by giving a variable z? that I should probably add yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you mean testing the integration along z by giving a variable z? that I should probably add yeah
yea this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. the analytical tests got kind of out of hand but I think everything works...
@@ -47,8 +47,8 @@ | |||
|
|||
_MIN_POLYGON_AREA = fp_eps | |||
|
|||
# number of points per dimension when discretizing polyslab faces for slab bounds gradients | |||
_NUM_PTS_DIM_SLAB_BOUNDS = 30 | |||
# spacing between points when discretizing polyslab faces and edges for gradients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just a points per wavelength would make most sense generally. For example this could easily become an issue with RF wavelengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so my thinking here was that this is much better than before, and checking for wavelength in material adds some additional complexity because we can't only depend on DerivativeInfo.frequency
, or at least it's not clear to me how to make this very robust. in principle the best solution would be adaptive integration, which was actually the first implementation that I had but it was pretty slow because it ended up needing a lot of sampling points to actually converge. so basically what I'm trying to say is I'd leave this as is for now and do the adaptive sampling in a separate pr if it ends up being a problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what we did in adjoint was pass the min wavelength in the Simulation.sources and divide by the minimum refractive index in the Simulation.
tidy3d/tidy3d/plugins/adjoint/components/structure.py
Lines 101 to 106 in 1000bda
freq_max = float(max(grad_data_eps.eps_xx.f)) | |
eps_in = self.medium.eps_model(frequency=freq_max) | |
ref_ind = np.sqrt(np.max(np.real(eps_in))) | |
ref_ind = max([1.0, abs(ref_ind)]) | |
wvl_free_space = C_0 / freq_max | |
wvl_mat = wvl_free_space / ref_ind |
does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting approach 😄 this is a bit at odds with the performance concern though isnt it? because it can end up severely oversampling the polygon. what happens if there are metals in the simulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for metals, the ref_ind would evaluate to 1. I think we can effectively cap the number of points , but this at least scales the sampling to the amount of detail in the fields so I think that's the proper way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll second that I think this spacing should scale with wavelength as well so that gradient accuracy is not dependent on wavelength (and agreed that in the case of RF, we will end up with really fine sampling otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suspicion for metals is that the vjp sampling might need to be based on the skin depth, ie I think we might actually need really fine sampling. I doubt we'd get away with 1/um for metals... anyhow. I added wavelength-adaptive sampling to the DerivativeInfo
and also implemented the skin-depth based approach. however, I'm setting the minimum allowable sampling distance to 10nm currently so the skin depth will actually never be resolved. once we figure this out you'll want to take another look at this @groberts-flex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds great to me! will definitely do some testing with the pec/metal version to make sure the sampling is right
ax_val = ax_min if min_max_index == 0 else ax_max | ||
accum = 0.0 | ||
|
||
# intentionally not vectorized to limit memory use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested this for speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have, yes. I originally had a fully vectorized implementation that ran basically as fast as the single sample point but it'd eat up way too much memory for large polygons, so I decided to go back to the non-vectorized version. in the limit where the vertices are already sampled finely (like in the shape optimized taper notebook), the speed is pretty much identical to before because it does the same thing (takes just an edge center sample). it basically only kicks in when the edges are longer than the sampling distance, in which case it is slower than previously, but correct 😄 for thin slabs it's basically the same thing as the "hack" of subdividing the vertices.
if not mask.any(): | ||
continue | ||
|
||
mesh = DerivativeSurfaceMesh( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this approach will be quite slow. it's the same as the adjoint plugin implementation (also vectorized over each edge xy only, but looped over z points and edges). We had complaints about speed back then. have you tested this on some problems? such as the taper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends on the problem. but yeah it's ~n times slower than without the z sampling, where n is the number of z samples. for a typical SOI device it shouldn't be making a huge difference I think. for a tall slab it will be significantly slower, but especially in that case it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious to understand speed of things like this versus simulation time length? are they approaching the length of the simulations or are they just relatively slow compared to other parts of the gradient calculation? This will be dependent on the exact simulation setting, but curious for some average 3D case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really exciting and will be a huge improvement. Is it possible to run some of the numerical testing and possibly modify the polyslab numerical test file as well to make sure we can test going forward the surface integration (and keep testing cases where it is adding the most improvement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates look good to me!
@@ -47,9 +47,6 @@ | |||
|
|||
_MIN_POLYGON_AREA = fp_eps | |||
|
|||
# number of points per dimension when discretizing polyslab faces for slab bounds gradients | |||
_NUM_PTS_DIM_SLAB_BOUNDS = 30 | |||
|
|||
|
|||
class PolySlab(base.Planar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've talked already before about having configuration options exposed to the user. Would we want to expose config options for how the integration is performed on the level of the geometric objects themselves? (i.e. - each polyslab could potentially have it's own config or if not specified would use a default config). It isn't something that would go into this PR anyway, just curious if ultimately it will be useful for users to be able to have the ability to config these integrations for each object in the simulation or if eventually we would just let them specify some more global options for controlling integrations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have only considered a global option, but I guess you can come up with a setup where it would be beneficial to control this explicitly. Seems like a pretty advanced use case though. But yeah it might make sense to have this machinery.
332d8f4
to
c0ab734
Compare
Was in the structure.py file and noticed that in make_adjoint_monitors ( tidy3d/tidy3d/components/structure.py Line 258 in 6b0c432
|
of only at center for VJP calculation
c0ab734
to
95c1619
Compare
Previously face gradients came from a fixed 30 × 30 grid so the accuracy depended on the size of the polygon. Edge gradients were evaluated only at the center of each face, so long edges and tall slabs produced incorrect adjoint results depending on the simulation.
Changed:
I also added some analytic tests for the face and edge integration.
The sample spacing is currently set to 50nm, which I think should be reasonable in most cases but is another one of those that we might either want to make adaptive later or expose in a config. For vertices spaced <50nm apart, the results are identical to before (tested on the shape optimization notebook).
@bzhangflex fyi