Activation functions #85

yonatankarni · 2023-01-08T09:47:49Z

as implemented in "deep" branch, the deep layers can use either RELU activation, or none (no activation function...).
conveniently, the activation function type ("relu"/"none") is already governed by a command line argument,
for instance for a 3rd layer with width 25 and RELU activation we add the command line args:
--nn 2:width:025 --nn 2:activation:relu

in this PR I add additional activation functions for the deep layers, which can be controlled in the same manner:
"leaky_relu", "tanh", "sigmoid".

now that we have 4 activation functions, it seems to me we can do better in terms of code re-use / eliminating repetitions between them, but not sure which approach to take so if you have concrete suggestions this is a good time and place to bring them up.

SkBlaz · 2023-01-08T15:50:47Z

@yonatankarni could you add a short description of what the contents of this PR are (i.e. an overview of sorts, as there are many changes)?

@SkBlaz Done (not sure why I can't simply reply, had to edit the comment :| )

adischw

@yonatankarni the deep branch was already merged to main, so better to close this PR and open one that merges straight to main.

@adischw thanks for the heads up, (not sure why I can't simply reply, had to edit the comment :| )
no need to open a new PR - I simply updated this one by changing the base and force-pushing the updated revision

adischw

@yonatankarni the deep branch was already merged to main, so please open a PR that merges straight to main.

@adischw thanks for the heads up, (not sure why I can't simply reply, had to edit the comment :| )
no need to open a new PR - I simply updated this one by changing the base and force-pushing the updated revision

yonatankarni · 2023-01-10T18:30:22Z

@yonatankarni the deep branch was already merged to main, so please open a PR that merges straight to main.

@adischw thanks for the heads up, (not sure why I can't simply reply, had to edit the comment :| ) no need to open a new PR - I simply updated this one by changing the base and force-pushing the updated revision

done (replying again, this time hopefully the right way)

SkBlaz · 2023-01-11T07:41:30Z

@yonatankarni it seems there is an issue with /FW/src/block_relu.rs (not sure if you meant we have a look after re-opening, if not please ignore this)

yonatankarni · 2023-01-11T07:42:41Z

@yonatankarni it seems there is an issue with /FW/src/block_relu.rs (not sure if you meant we have a look after re-opening, if not please ignore this)

@SkBlaz yes, this is due to a merge with new incoming changes from main, I will fix it shortly.

…igmoid

SkBlaz · 2023-01-11T10:58:58Z

src/block_leaky_relu.rs

+    input: graph::BlockPtrOutput,
+) -> Result<graph::BlockPtrOutput, Box<dyn Error>> {
+    let num_inputs = bg.get_num_output_values(vec![&input]);
+    assert!(num_inputs != 0);


Should this be debug_assert?

frankly I just copy-pasted this from block_relu, that's why I said we might want to consider eliminating code repetition.
as to whether this should be a assert of debug_assert - since we run FW from release builds, and it seems @andraztori intended for this to fail in case of bad wiring, I think we should keep it a regular assert for now.

SkBlaz · 2023-01-11T11:00:38Z

src/block_leaky_relu.rs

+        unsafe {
+            for i in 0..self.num_inputs as usize {
+                let x = *pb.tape.get_unchecked_mut(self.input_offset + i);
+                if x < 0.0 {


The < and <= are intentional right? (one strict)

the first section is for the value of leakyRELU,
the second is for the derivative.
RELU/leakyRELU(x) is defined to equal 0 at x=0, so I use "<" to avoid a redundant multiplication.
however, the derivative of RELU and leakyRELU isn't defined at 0, and I read somewhere (see link below) that the convention is to set it to 1 for x=0, that's why there is a difference (https://stats.stackexchange.com/questions/333394/what-is-the-derivative-of-the-relu-activation-function)

SkBlaz · 2023-01-11T11:01:41Z

src/block_leaky_relu.rs

+    }
+
+    fn get_num_output_slots(&self) -> usize {
+        1


This method's name does not reflect the contents -> why is this not a constant of the object?

this method is an implementation of the abstract method from the BlockTrait trait

SkBlaz · 2023-01-11T11:02:16Z

src/block_sigmoid.rs

+    input: graph::BlockPtrOutput,
+) -> Result<graph::BlockPtrOutput, Box<dyn Error>> {
+    let num_inputs = bg.get_num_output_values(vec![&input]);
+    assert!(num_inputs != 0);


debug_assert!?

see my reply to same comment for block_leaky_relu

SkBlaz · 2023-01-11T11:03:20Z

src/block_tanh.rs

+    fn allocate_and_init_weights(&mut self, _mi: &model_instance::ModelInstance) {}
+
+    fn get_num_output_slots(&self) -> usize {
+        1


The constant function once more (same comment as in the previous example)

see reply for previous comment on this function

SkBlaz · 2023-01-11T11:03:39Z

src/block_tanh.rs

+    }
+
+    fn set_input_offset(&mut self, input: graph::InputSlot, offset: usize) {
+        assert!(input.get_input_index() == 0);


The assert here and above - are they necessary?

see first reply (copy-pasted from block_relu, and my bet is that it's best to have them there to catch issues early)

SkBlaz · 2023-01-11T11:04:12Z

src/block_tanh.rs

+            for i in 0..self.num_inputs as usize {
+                let x = *pb.tape.get_unchecked_mut(self.input_offset + i);
+
+                // for now using libm tanh computation. once we establish a baseline,


This is a good idea - having full dependency just for computing tanh seems like a lot? (might have missed other uses)

no, I added it just for that. the package seems pretty small though (~43K), but I can check if fast approximations perform good enough (https://math.stackexchange.com/questions/107292/rapid-approximation-of-tanhx)

Ok. In case we'd want to roll our own, This one should do a pretty decent trick perhaps

float fast_tanh(float x){ float x2 = x * x; float a = x * (135135.0f + x2 * (17325.0f + x2 * (378.0f + x2))); float b = 135135.0f + x2 * (62370.0f + x2 * (3150.0f + x2 * 28.0f)); return a / b; }

cool! I hope I will get to test that as well.
I speculate @andraztori's approach is to replace those with lookup tables (which don't involve any computation...), but I think we'll get there only after we see the value - so I will proceed testing

SkBlaz · 2023-01-11T11:05:30Z

src/regressor.rs

                }
-                if layernorm == NNLayerNorm::AfterRelu {
+
+                if layernorm == NNLayerNorm::AfterActivation {


Nice work! Btw, worth seeing speed comparison with just relu in this case, probably not critical but would be interesting to see

sure I will test if for a regular sequential training scenario

bbenshalom · 2023-01-19T12:06:33Z

src/block_leaky_relu.rs

+        assert_epsilon!(slearn2(&mut bg, &fb, &mut pb, true), 2.0); // leaky_relu doesn't learn
+    }
+
+    fn test_simple_negative() {


here and in other files, I believe you meant to add a #[test] for this test case

bbenshalom · 2023-01-19T12:10:34Z

src/block_leaky_relu.rs

+    use block_helpers::slearn2;
+    use block_misc::Observe;
+
+    fn fb_vec() -> feature_buffer::FeatureBuffer {


here and in all other files where this applies - if you have a function that's being used only in the unit tests, please add a #[cfg(test)] above it so cargo doesn't render it unused

yonatankarni changed the base branch from main to deep January 8, 2023 09:49

yonatankarni force-pushed the activation_functions branch from 3ef411e to 9ebdeaf Compare January 8, 2023 10:44

yonatankarni changed the base branch from deep to main January 8, 2023 10:45

yonatankarni changed the base branch from main to deep January 8, 2023 10:45

yonatankarni force-pushed the activation_functions branch from 9ebdeaf to 6c70fbc Compare January 8, 2023 10:49

yonatankarni changed the base branch from deep to main January 8, 2023 11:01

yonatankarni changed the base branch from main to deep January 8, 2023 11:01

yonatankarni requested review from SkBlaz, bbenshalom, andraztori, adischw and ggaspersic January 8, 2023 11:09

adischw reviewed Jan 9, 2023

View reviewed changes

adischw requested changes Jan 9, 2023

View reviewed changes

yonatankarni changed the base branch from deep to main January 9, 2023 06:56

yonatankarni force-pushed the activation_functions branch from c1d115a to 769ff63 Compare January 9, 2023 07:37

yonatankarni closed this Jan 10, 2023

yonatankarni reopened this Jan 10, 2023

yonatankarni added 2 commits January 11, 2023 12:09

introducing additional activations functions - leaky_rely, tanh and s…

9b055ee

…igmoid

block_sigmoid + block_tanh - fix test 'test_simple_positive'

8cc0ff4

yonatankarni force-pushed the activation_functions branch from 10e7d52 to 8cc0ff4 Compare January 11, 2023 10:11

SkBlaz reviewed Jan 11, 2023

View reviewed changes

bbenshalom requested changes Jan 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation functions #85

Activation functions #85

yonatankarni commented Jan 8, 2023 •

edited

Loading

SkBlaz commented Jan 8, 2023 •

edited by yonatankarni

Loading

adischw left a comment •

edited by yonatankarni

Loading

adischw left a comment •

edited by yonatankarni

Loading

yonatankarni commented Jan 10, 2023

SkBlaz commented Jan 11, 2023

yonatankarni commented Jan 11, 2023 •

edited

Loading

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023 •

edited

Loading

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023 •

edited

Loading

SkBlaz Jan 12, 2023 •

edited

Loading

yonatankarni Jan 12, 2023

SkBlaz Jan 11, 2023

yonatankarni Jan 11, 2023

bbenshalom Jan 19, 2023

bbenshalom Jan 19, 2023

Activation functions #85

Are you sure you want to change the base?

Activation functions #85

Conversation

yonatankarni commented Jan 8, 2023 • edited Loading

SkBlaz commented Jan 8, 2023 • edited by yonatankarni Loading

adischw left a comment • edited by yonatankarni Loading

Choose a reason for hiding this comment

adischw left a comment • edited by yonatankarni Loading

Choose a reason for hiding this comment

yonatankarni commented Jan 10, 2023

SkBlaz commented Jan 11, 2023

yonatankarni commented Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonatankarni Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonatankarni Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

SkBlaz Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonatankarni commented Jan 8, 2023 •

edited

Loading

SkBlaz commented Jan 8, 2023 •

edited by yonatankarni

Loading

adischw left a comment •

edited by yonatankarni

Loading

adischw left a comment •

edited by yonatankarni

Loading

yonatankarni commented Jan 11, 2023 •

edited

Loading

yonatankarni Jan 11, 2023 •

edited

Loading

yonatankarni Jan 11, 2023 •

edited

Loading

SkBlaz Jan 12, 2023 •

edited

Loading