Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with OTP 27.0 #788

Closed
dvic opened this issue Jul 8, 2024 · 17 comments · Fixed by #790 or #796
Closed

Crash with OTP 27.0 #788

dvic opened this issue Jul 8, 2024 · 17 comments · Fixed by #790 or #796

Comments

@dvic
Copy link

dvic commented Jul 8, 2024

Not sure if support for Elixir 1.17 is not finished yet (though #760 is merged), but I'm encountering this on one of our projects:

20:07:07.305 [info] Lexical Initialized
20:07:07.307 [info] opened file:///Users/dvic/Development/pluto/pluto/lib/pluto/store.ex
20:07:19.931 [info] Compiled pluto in 12.6 seconds
20:07:36.014 [warn] The node has died. Restarting after deleting the build directory
20:07:36.016 [error] GenServer :"pluto::node_process" terminating
** (FunctionClauseError) no function clause matching in :gen.reply/2
    (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
    (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
    (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Last message: {:nodedown, :"[email protected]", [node_type: :visible]}
State: %LXical.RemoteControl.ProjectNode.State{project: %LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904}, port: #Port<0.8>, cookie: :lexical, stopped_by: nil, stop_timeout: nil, started_by: {#PID<0.1544.0>, [:alias | #Reference<0.0.197635.4109144966.1078788099.69902>]}, status: :started}
20:07:36.021 [error] Process :"pluto::node_process" (#PID<0.1551.0>) terminating
** (FunctionClauseError) no function clause matching in :gen.reply/2
    (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
    (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
    (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.RemoteControl.ProjectNode.init/1
Ancestors: [LXical.RemoteControl.ProjectNodeSupervisor, :"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 1
Messages: [{:EXIT, #Port<0.8>, :normal}]
Links: [#PID<0.1543.0>]
Dictionary: []
Trapping Exits: true
Status: :running
Heap Size: 10958
Stack Size: 29
Reductions: 78453
20:07:36.021 [error] Child :undefined of Supervisor LXical.RemoteControl.ProjectNodeSupervisor terminated
** (exit) an exception was raised:
    ** (FunctionClauseError) no function clause matching in :gen.reply/2
        (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
        (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
        (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
        (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
        (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
        (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Pid: #PID<0.1551.0>
Start Call: LXical.RemoteControl.ProjectNode.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :transient
Shutdown: 5000
Type: :worker
20:07:36.380 [error] GenServer :"pluto::node" terminating
** (stop) {:error, {:already_started, #PID<0.1559.0>}}
Last message: {:nodedown, :"[email protected]"}
State: %LXical.Server.Project.Node.State{project: %LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904}, node: :"[email protected]", supervisor_pid: #PID<0.1551.0>}
20:07:36.381 [error] Process :"pluto::node" (#PID<0.1544.0>) terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2304: :gen_server.handle_common_reply/8
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 849071
20:07:36.381 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" terminated
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Pid: #PID<0.1544.0>
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.388 [error] Process #PID<0.1560.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 225484
20:07:36.389 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" failed to start
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Pid: #PID<0.1544.0>
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.395 [error] Process #PID<0.1561.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 225484
20:07:36.395 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" failed to start
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.401 [error] Process #PID<0.1562.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
...
@dvic
Copy link
Author

dvic commented Jul 8, 2024

If I fix the nil stopped_by, I just get: 20:17:10.323 [warn] The node has died. Restarting after deleting the build directory, with no further logs.

@scottming
Copy link
Collaborator

I noticed that this issue only occurs in OTP 27. I conducted some simple investigations but have not yet identified the specific cause.

I tried starting Node using the command line, but it does not halt. I'm not sure if OTP 27 has made changes to the implementation of port.

@dvic
Copy link
Author

dvic commented Jul 9, 2024

Yes, I can confirm that switching to OTP 26 fixes the issue.

@dvic dvic changed the title Crash with Elixir 1.17.2 and OTP 27.0 Crash with OTP 27.0 Jul 9, 2024
@Moosieus
Copy link
Collaborator

Moosieus commented Jul 9, 2024

Do you see any errors in .lexical/project.log when the crash occurs?

Happy path here seems to be:

  • Receive :call of {:stop, stop_timeout}
  • Stop the project node via project_rpc/3
  • Save from to state.stopped_by
  • Reply :ok once the node's down

However if the project node dies for any other reason, we end up at the error you shared above.

@Moosieus
Copy link
Collaborator

Moosieus commented Jul 9, 2024

What I've done to repro:

  • Remote into a project like so: ./bin/debug_shell.sh lexical
  • List node names w/ :net_adm.names()
  • Brutally kill a project-* node using this script
  • See .lexical/lexical.log and ./lexical/project.log for the resulting error

@dvic
Copy link
Author

dvic commented Jul 10, 2024

Do you see any errors in .lexical/project.log when the crash occurs?

Happy path here seems to be:

  • Receive :call of {:stop, stop_timeout}
  • Stop the project node via project_rpc/3
  • Save from to state.stopped_by
  • Reply :ok once the node's down

However if the project node dies for any other reason, we end up at the error you shared above.

I only have a lexical.log, I don't see any other errors. This is the full output:

❯ tail -f .lexical/lexical.log
Shutdown: 5000
Type: :worker
10:21:52.348 [info] Child LXical.Server.Project.SearchListener of Supervisor :"pluto::supervisor" started
Pid: #PID<0.1558.0>
Start Call: LXical.Server.Project.SearchListener.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 36412})
Restart: :permanent
Shutdown: 5000
Type: :worker
10:21:52.349 [info] Lexical Initialized
10:21:52.350 [info] opened file:///Users/dvic/Development/pluto/pluto/lib/pluto/store.ex
10:22:28.867 [info] Child IEx.Config of Supervisor IEx.Supervisor started
Pid: #PID<0.1571.0>
Start Call: IEx.Config.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Child IEx.Broker of Supervisor IEx.Supervisor started
Pid: #PID<0.1572.0>
Start Call: IEx.Broker.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Child IEx.Pry of Supervisor IEx.Supervisor started
Pid: #PID<0.1573.0>
Start Call: IEx.Pry.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Application iex started at :"[email protected]"
10:22:36.637 [info] Child of Supervisor :inet_gethost_native_sup started
Pid: #PID<0.1578.0>
Start Call: :inet_gethost_native.init([])
10:22:36.637 [info] Child :inet_gethost_native_sup of Supervisor :kernel_safe_sup started
Pid: #PID<0.1577.0>
Start Call: :inet_gethost_native.start_link()
Restart: :temporary
Shutdown: 1000
10:22:58.609 [info] Compiled pluto in 66.3 seconds
10:23:09.567 [warn] 'global' at node :"[email protected]" requested disconnect from node :"[email protected]" in order to prevent overlapping partitions
10:23:09.567 [warn] The node has died. Restarting after deleting the build directory

I was also logged in when the LSP started, and it seems it started two managers?

❯ ./bin/debug_shell.sh pluto
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Interactive Elixir (1.17.2) - press Ctrl+C to exit (type h() ENTER for help)
iex([email protected])1> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"project-pluto-36412", 52099},
   {~c"shell", 52119}
 ]}
iex([email protected])2> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"project-pluto-36412", 52099},
   {~c"shell", 52119}
 ]}
iex([email protected])3> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"shell", 52119},
   {~c"project-pluto-36412", 52136},
   {~c"manager-lexical-41512", 52148},
   {~c"project-lexical-41512", 52163}
 ]}
iex([email protected])4>

@Moosieus Moosieus reopened this Jul 11, 2024
@scohen
Copy link
Collaborator

scohen commented Jul 16, 2024

@dvic do you have two editors running? If not having two managers seems like it's a problem.

@dvic
Copy link
Author

dvic commented Jul 16, 2024

@dvic do you have two editors running? If not having two managers seems like it's a problem.

No I believe I was not running two editors but not 100% sure, maybe I started hx in lexical. Also, now that I look at the output I previously posted the two managers are for different projects: manager-pluto-36412 and manager-lexical-41512.

@scohen
Copy link
Collaborator

scohen commented Jul 18, 2024

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

@scottming
Copy link
Collaborator

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

@scohen

Are you unable to reproduce the OTP 27 node down issue locally? It seems we just need to use this version:

asdf local elixir 1.17.1-otp-27 && asdf local erlang 27.0

Then after running mix package and opening the lexical project itself, you can see these warnings and constant restarts.

10:13:18.740 [warn] The node has died. Restarting after deleting the build directory
10:14:16.274 [warn] The node has died. Restarting after deleting the build directory

@dvic
Copy link
Author

dvic commented Jul 18, 2024

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

Unfortunately, yes. Can you reproduce it in lexical itself with the steps outlined above? Otherwise I can try to cook up a reproducible repo.

@scohen
Copy link
Collaborator

scohen commented Jul 19, 2024

We're investigating, @zachallaun can reproduce some failures, hopefully they're the same as what you're seeing.

@altdsoy
Copy link

altdsoy commented Jul 22, 2024

Hi there...
Sorry to hijack this thread..
I see that OP has a crash when running on OTP 27.0
I'm trying Lexical but I simpy got the message that it's not even compatible with Elixir 1.17.2 and OTP 27.0.1:

[info] FATAL: Lexical is not compatible with Elixir 1.17.2

Lexical is compatible with the following versions of Elixir:

  >= 1.13.0
  >= 1.14.0
  >= 1.15.3
  >= 1.16.0


FATAL: Lexical is not compatible with Erlang/OTP 27.0.1

Lexical is compatible with the following versions of Erlang/OTP:

  >= 24.3.4
  >= 25.0.0
  >= 26.0.2

Which is also confirmed from the docs here: https://github.com/lexical-lsp/lexical/blob/main/pages/installation.md

So what's the current situation if I'm running Elixir 1.17.2 with OTP 27?

Thanks!

@scohen
Copy link
Collaborator

scohen commented Jul 22, 2024

@altdsoy please don't hijack threads, open an issue instead. Your problem is because you're using a released version of lexical and haven't built the latest main branch. Once this issue is fixed, we'll release.

@zachallaun
Copy link
Collaborator

I believe this is fixed by #796. Would someone who's experienced this crash be able to pull the branch za-ets-compressed, re-package Lexical, and confirm that it's fixed?

@dvic
Copy link
Author

dvic commented Jul 23, 2024

I believe this is fixed by #796. Would someone who's experienced this crash be able to pull the branch za-ets-compressed, re-package Lexical, and confirm that it's fixed?

I can confirm that that branch fixes the issue, thanks @zachallaun! Too bad about the increased memory usage, but that's all we can do for now I guess. Is this a known issue with OTP 27 or do we have to report it?

@zachallaun
Copy link
Collaborator

Is this a known issue with OTP 27 or do we have to report it?

I don't believe it's a known issue. I'm working on a report now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants