-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct counting of shared parameters in Base.show
.
#2335
base: master
Are you sure you want to change the base?
Conversation
If this looks good, I'll go ahead and add some tests and add an example in the documentation as well. |
The documentation CI is failing for an |
That's julia> Flux.trainable(RNN(2 => 5))
(cell = RNNCell(2 => 5, tanh),) |
I see, yes, that makes sense. I think I understand now why it's not showing any non-trainable parameters for
|
Yes, tied parameters are tricky as we found out while working on Optimisers.jl. Sometimes it feels like a philosophical question. Do we consider array wrappers like |
Taking inspiration from |
I believe that'd run into the same problem with shared params across nominally different layers. Maybe one idea would be to separately count the number of shared params and report that? |
Can we farm more of this out to Functors / Optimisers? Instead of building an IdSet by hand, let Functors cache things. Then this will inherit its understanding of Adjoint etc. (I believe Optimisers.jl has a trainable-only walk definition, since it owns that concept.) |
Hi @ToucheSir, could you explain the "nominally different layers" part? I didn't quite follow it. Maybe an example?
Sure; I'll take a look at both Functors and Optimisers more closely. |
Something like this: d1 = Dense(3 => 4)
d2 = Dense(d1.weight)
d1.weight === d2.weight # tied
d1 !== d2 # but pushing the whole layer won't capture that |
I see, yes, that makes sense. I think, any solution which counts distinct (or shared) parameters in a model must use some form of unique ID associated to that parameter (I can't think of other ways atm, maybe there are more clever ways). Can we somehow associate such an ID to every parameter in a Flux model? Or more generally, associate some metadata to each leaf of a struct? |
Functors uses a cache which should detect such sharing. It's a little smarter than just using julia> using Functors, Flux
julia> let mat = rand(2,2)
model = Chain(Dense(mat), Dense(mat')) # separate bias vectors
cnt = Ref(0)
fmapstructure(model; exclude=x->x isa Array) do x
cnt[] += 1
end
end
(layers = ((weight = 1, bias = 2, σ = ()), (weight = (parent = 1,), bias = 3, σ = ())),)
julia> using StaticArrays
julia> [1,2] === [1,2] # different arrays with same value, not shared
false
julia> SA[1,2] === SA[1,2] # here the value is the only identity... still not shared.
true
julia> let mat = @SMatrix rand(2,2)
model = Chain(Dense(mat), Dense(mat')) # still has mat === mat
cnt = Ref(0)
fmapstructure(model; exclude=x->x isa AbstractArray) do x
cnt[] += 1
end
end
(layers = ((weight = 1, bias = 2, σ = ()), (weight = 3, bias = 4, σ = ())),) I think |
This PR fixes the
show
function by correctly counting non-trainable parameters. The earlier counting code duplicated shared parameters in it's count (Flux.jl/src/layers/show.jl
Line 96 in ebcbbe4
_big_finale
function, where, instead of duplicating the counts, we use anIdSet
to keep track of which parameters have been counted (and don't count a parameter twice).As an example, now the following code shows the correct output:
TODO:
Closes #2321.