Scale the residual for better performance/robustness

It maybe makes sense to scale the residual in the MINPACK solve so that all values are approximately 1. A scaling can be derived before the root solve, then applied. This will cause the root solve to treat each layer more evenly and perhaps converge in fewer iterations and more reliably. Would require a lot of testing.