Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing error logs when application crashes on OTP-26 #9163

Open
Maciej-Rosa opened this issue Dec 9, 2024 · 4 comments
Open

Missing error logs when application crashes on OTP-26 #9163

Maciej-Rosa opened this issue Dec 9, 2024 · 4 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@Maciej-Rosa
Copy link

Maciej-Rosa commented Dec 9, 2024

Describe the bug
If application returns error, entire system terminates, but there is no info about crashed application in logs.

To Reproduce

I tried it on docker image erlang:26, with newest version of rebar3 installed:

root@eaefdec2d763:/rebar3# ./rebar3 --version
rebar 3.24.0 on Erlang/OTP 26 Erts 14.2.5.2

Create umbrella using rebar3:

root@eaefdec2d763:/# ./rebar3/rebar3 new umbrella
===> Writing myapp/apps/myapp/src/myapp_app.erl
===> Writing myapp/apps/myapp/src/myapp_sup.erl
===> Writing myapp/apps/myapp/src/myapp.app.src
===> Writing myapp/rebar.config
===> Writing myapp/config/sys.config
===> Writing myapp/config/vm.args
===> Writing myapp/.gitignore
===> Writing myapp/LICENSE.md
===> Writing myapp/README.md

Edit "myapp_app.erl" so it returns error. Add some logger:errors to file e.g.:

%%%-------------------------------------------------------------------
%% @doc myapp public API
%% @end
%%%-------------------------------------------------------------------

-module(myapp_app).

-behaviour(application).

-export([start/2, stop/1]).

start(_StartType, _StartArgs) ->
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    timer:sleep(1000),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    logger:error("Here is error!"),
    io:format("Hello world!"),
    {error, my_error}.


stop(_State) ->
    ok.

%% internal functions

I added timer:sleep(1000) so it can be seen clearly that only some logs are present in log file.

Setup logger to log to external file:

[
  {myapp, []},
    {kernel, [
    {logger, [
      {handler, default, logger_disk_log_h, #{
        config => #{file => "log/default.log", max_no_files => 4},
        formatter =>
        {logger_formatter, #{
          legacy_header => false,
          max_size => 4096,
          single_line => false
        }}
      }}
    ]}]}  
].

Release application:

root@eaefdec2d763:/myapp# ../rebar3/rebar3 release
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling myapp
===> Assembling release myapp-0.1.0...
===> Release successfully assembled: _build/default/rel/myapp

Run released application as deamon.

root@eaefdec2d763:/myapp/_build/default/rel/myapp# ./bin/myapp daemon 
root@eaefdec2d763:/myapp/_build/default/rel/myapp# run_erl[826]: Erlang closed the connection.

Check generated default.log1 and erlang.log.1 file:

root@eaefdec2d763:/myapp/_build/default/rel/myapp# cat log/default.log.1 
2024-12-09T11:13:11.401703+00:00 error:
Here is error!
2024-12-09T11:13:11.401742+00:00 error:
Here is error!
2024-12-09T11:13:11.401783+00:00 error:
Here is error!
2024-12-09T11:13:11.401806+00:00 error:
Here is error!
2024-12-09T11:13:11.401824+00:00 error:
Here is error!
2024-12-09T11:13:11.401843+00:00 error:
Here is error!
2024-12-09T11:13:11.401863+00:00 error:
Here is error!


root@eaefdec2d763:/myapp/_build/default/rel/myapp# cat log/erlang.log.1 

=====
===== LOGGING STARTED Mon Dec  9 11:13:10 UTC 2024
=====
Exec: /usr/local/lib/erlang/erts-14.2.5.2/bin/erlexec -boot /myapp/_build/default/rel/myapp/releases/0.1.0/start -mode embedded -boot_var SYSTEM_LIB_DIR /usr/local/lib/erlang/lib -config /myapp/_build/default/rel/myapp/releases/0.1.0/sys.config -args_file /myapp/_build/default/rel/myapp/releases/0.1.0/vm.args -- console  --relx-disable-hooks
Root: /myapp/_build/default/rel/myapp
/myapp/_build/default/rel/myapp
Erlang/OTP 26 [erts-14.2.5.2] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:30] [jit:ns]

Hello world!Runtime terminating during boot (terminating)

Crash dump is being written to: erl_crash.dump...done

Expected behavior
All given logs are present before erlang crashes due to error. This error is not present in OTP-25 afaik, tried the same method on erlang:25 and it shows not only entirety of log given by application, but also shows which application has crashed inside erlang.log.1:

root@c8877030ec86:/myapp/_build/default/rel/myapp# cat log/erlang.log.1 

=====
===== LOGGING STARTED Mon Dec  9 11:28:38 UTC 2024
=====
Exec: /usr/local/lib/erlang/erts-13.2.2.10/bin/erlexec -boot /myapp/_build/default/rel/myapp/releases/0.1.0/start -mode embedded -boot_var SYSTEM_LIB_DIR /usr/local/lib/erlang/lib -config /myapp/_build/default/rel/myapp/releases/0.1.0/sys.config -args_file /myapp/_build/default/rel/myapp/releases/0.1.0/vm.args -- console  --relx-disable-hooks
Root: /myapp/_build/default/rel/myapp
/myapp/_build/default/rel/myapp
Erlang/OTP 25 [erts-13.2.2.10] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:30] [jit:ns]

Hello world!{"Kernel pid terminated",application_controller,"{application_start_failure,myapp,{my_error,{myapp_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,myapp,{my_error,{myapp_app,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done

root@c8877030ec86:/myapp/_build/default/rel/myapp# cat log/default.log.1 
2024-12-09T11:28:39.640272+00:00 error:
Here is error!
2024-12-09T11:28:39.640339+00:00 error:
Here is error!
2024-12-09T11:28:39.640359+00:00 error:
Here is error!
2024-12-09T11:28:39.640375+00:00 error:
Here is error!
2024-12-09T11:28:39.640454+00:00 error:
Here is error!
2024-12-09T11:28:39.640493+00:00 error:
Here is error!
2024-12-09T11:28:39.640527+00:00 error:
Here is error!
2024-12-09T11:28:40.641450+00:00 error:
Here is error!
2024-12-09T11:28:40.641633+00:00 error:
Here is error!
2024-12-09T11:28:40.641784+00:00 error:
Here is error!
2024-12-09T11:28:40.641836+00:00 error:
Here is error!
2024-12-09T11:28:40.641877+00:00 error:
Here is error!
2024-12-09T11:28:40.641950+00:00 error:
Here is error!
2024-12-09T11:28:40.642003+00:00 error:
Here is error!
2024-12-09T11:28:40.642041+00:00 error:
Here is error!
2024-12-09T11:28:40.642300+00:00 error:
  crasher:
    initial call: application_master:init/4
    pid: <0.267.0>
    registered_name: []
    exception exit: {my_error,{myapp_app,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
    ancestors: [<0.266.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.268.0>,normal}]
    links: [<0.266.0>,<0.221.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 172
  neighbours:

2024-12-09T11:28:40.642808+00:00 notice:
    application: myapp
    exited: {my_error,{myapp_app,start,[normal,[]]}}
    type: permanent


Affected versions
OTP-26 (26.2.5.2), OTP-27(27.1.2)

Additional context

@Maciej-Rosa Maciej-Rosa added the bug Issue is reported as a bug label Dec 9, 2024
@Maciej-Rosa
Copy link
Author

I suspect those problems may be linked to changes in this commit:
ec8e440#diff-2e7a405caf2dc0b5e567c6f01d62ad4b37ef9e517976ffc02c716900f2c87a9fR1248
Because of this line in erlang.log.1:

Runtime terminating during boot (terminating)

@garazdawi garazdawi self-assigned this Dec 9, 2024
@garazdawi
Copy link
Contributor

Thanks for the report. It is easy enough to reproduce, so should be relatively easy to figure out what is going on. I did some
preliminary digging and will continue tomorrow, adding some notes about the behaviours I found:

  1. If you in the example add sasl as a dependency to myapp in myapp.app.src the logs look again the same in 25 and 26.
  2. The same if you remove sasl in the rebar.conf file.
  3. When running the test repeatedly, it seems like 26 and 27 can show the same error as 25, though it is very rare.

@Maciej-Rosa
Copy link
Author

I looked into this issue a little bit and it seems to be related to changes I've linked before. Module init.erl executes start.boot script, which has loading commands at the end, like this:

     {apply,{application,start_boot,[kernel,permanent]}},
     {apply,{application,start_boot,[stdlib,permanent]}},
     {apply,{application,start_boot,[myapp,permanent]}},
     {apply,{application,start_boot,[sasl,permanent]}},

Because of application_controller failed during load of myapp, it started terminating mode so it's sending terminating reply to every process that calls it, including boot process which executes this part of code:
ec8e440#diff-2e7a405caf2dc0b5e567c6f01d62ad4b37ef9e517976ffc02c716900f2c87a9fR498
Which causes exit o be sent to init:boot_loop process, which halts entire program. It's worth noticing that the problem isn't present on application that caused crash but on application which loads after that, which explains this behaviour:

If you in the example add sasl as a dependency to myapp in myapp.app.src the logs look again the same in 25 and 26.

Because adding sasl to dependency causes application booting to be reordered, so failing application is loaded as last in this scenario.

@garazdawi
Copy link
Contributor

Thanks for the help with the analysis. This patch seems to fix the problem:

diff --git a/erts/preloaded/src/init.erl b/erts/preloaded/src/init.erl
index 7e5e8bb152..5f8480000e 100644
--- a/erts/preloaded/src/init.erl
+++ b/erts/preloaded/src/init.erl
@@ -697,6 +697,11 @@ boot_loop(BootPid, State) ->
            notify(State#state.subscribed),
            loop(State#state{status = {started,PS},
                             subscribed = []});
+        {'EXIT',BootPid,terminating} ->
+            %% If BootPid exited with terminating, then AC is about to
+            %% terminate and thus the entire system. We ignore this here
+            %% in order to give kernel a chance to flush logs to disk.
+            boot_loop(BootPid, State);
        {'EXIT',BootPid,Reason} ->
            % erlang:display({"init terminating in do_boot",Reason}),
            crash("Runtime terminating during boot", [Reason]);

Now I just need to figure out if that is the correct thing to do, or if it will break something else...

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

3 participants