Skip to content

Notebook kernel crashed on Linux systems #744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChromatinRemodeling opened this issue Apr 22, 2025 · 15 comments
Open

Notebook kernel crashed on Linux systems #744

ChromatinRemodeling opened this issue Apr 22, 2025 · 15 comments
Labels
bug Something isn't working

Comments

@ChromatinRemodeling
Copy link

I tried to reproduce the result in simulation.ipynb in a newly installed python3.11 environment. On Windows system everything worked fine, but on Linux systems the notebook kernel will always crash on the following lines:

Is = bm.ones(1000) * 20.  # 100 ms
_ = runner.run(inputs=Is)

This behavior is consistently observed in Colab, as well as in WSL and Ubuntu. For most of the time the error message is:

The Kernel crashed while executing code in the current cell or a previous cell. 
Please review the code in the cell(s) to identify a possible cause of the failure. 
Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. 
View Jupyter log for further details.

But for once or twice there is a JAX-related error, which I am unable to reproduce.

@ChromatinRemodeling ChromatinRemodeling added the bug Something isn't working label Apr 22, 2025
@Routhleck
Copy link
Member

Could you provide the Jupyter log?

@ChromatinRemodeling
Copy link
Author

Sure.

On WSL:

Visual Studio Code (1.99.3, wsl, desktop)
Jupyter Extension Version: 2025.3.0.
Python Extension Version: 2025.4.0.
Pylance Extension Version: 2025.4.1.
Platform: linux (x64).
Home = /home/trli
Temp Storage folder ~/.vscode-server/data/User/globalStorage/ms-toolsai.jupyter/version-2025.3.0
Workspace folder ~/lab/BrainPy
15:55:26.507 [info] Starting Kernel (Python Path: ~/lab/BrainPy/.pixi/envs/default/bin/python, VirtualEnv, 3.11.12) for '~/lab/BrainPy/simulation.ipynb' (disableUI=true)
15:55:27.205 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python -m pip list
15:55:27.218 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
15:55:27.227 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python -m ipykernel_launcher --f=/run/user/1000/jupyter/runtime/kernel-v306402bdd2757f65721ad1c4adc721a50a10861b2.json
    > cwd: ~/lab/BrainPy
15:55:29.193 [info] Kernel successfully started
15:55:29.205 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python /home/~/.vscode-server/extensions/ms-toolsai.jupyter-2025.3.0-linux-x64/pythonFiles/printJupyterDataDir.py
15:55:30.011 [error] Widget Error: Failed to access CDN https://unpkg.com/ after 0 attempt(s), TypeError: Failed to fetch
15:56:18.307 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 
15:56:24.864 [info] Restart requested ~/lab/BrainPy/simulation.ipynb
15:56:24.869 [warn] Cancel all remaining cells due to dead kernel
15:56:24.884 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
15:56:24.897 [info] Process Execution: ~/lab/BrainPy/.pixi/envs/default/bin/python -m ipykernel_launcher --f=/run/user/1000/jupyter/runtime/kernel-v30b6b13b27902705e04f4629999cf1e196497217d.json
    > cwd: ~/lab/BrainPy
15:56:25.335 [info] Restarted 2d2a6462-78b3-44b0-8465-6954e8466945

@ChromatinRemodeling
Copy link
Author

On Colab: app.log

@Routhleck
Copy link
Member

Ok i see, Could you provide your hardware info and software version

@ChromatinRemodeling
Copy link
Author

On Ubuntu 24.04:

Visual Studio Code (1.99.3, undefined, desktop)
Jupyter Extension Version: 2025.3.0.
Python Extension Version: 2025.4.0.
Pylance Extension Version: 2025.4.1.
Platform: linux (x64).
Home = /home/chromatin_remodeling
Temp Storage folder ~/.config/Code/User/globalStorage/ms-toolsai.jupyter/version-2025.3.0
Workspace folder ~/newData/BrainPy/Brainpy
19:05:25.924 [info] Starting Kernel (Python Path: ~/newData/BrainPy/Brainpy/.pixi/envs/default/bin/python, VirtualEnv, 3.10.17) for '~/newData/BrainPy/Brainpy/src/simulation.ipynb' (disableUI=true)
19:05:26.038 [info] Process Execution: ~/newData/BrainPy/Brainpy/.pixi/envs/default/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
19:05:26.044 [info] Process Execution: ~/newData/BrainPy/Brainpy/.pixi/envs/default/bin/python -m ipykernel_launcher --f=/run/user/1001/jupyter/runtime/kernel-v333068a3e4aa9ddf5cb47b860757b5b4bde603d3e.json
    > cwd: ~/newData/BrainPy/Brainpy/src
19:05:26.056 [info] Process Execution: ~/newData/BrainPy/Brainpy/.pixi/envs/default/bin/python -m pip list
19:05:26.506 [info] Kernel successfully started
19:05:26.517 [info] Process Execution: ~/newData/BrainPy/Brainpy/.pixi/envs/default/bin/python /home/~/.vscode/extensions/ms-toolsai.jupyter-2025.3.0-linux-x64/pythonFiles/printJupyterDataDir.py
19:05:41.099 [error] Disposing session as kernel process died ExitCode: undefined, Reason:

Installed packages:

brainpy           2.6.0.post20250420
brainstate        0.1.0.post20250420
braintaichi       0.0.4
brainunit         0.0.8
jax               0.5.3
jaxlib            0.5.3
...

The full list is here:

Ubuntu.txt

@ChromatinRemodeling
Copy link
Author

I am using Dell Inc. Precision 3660 with 13th Gen Intel® Core™ i9-13900K × 32 and RTX 4090 here.

@ChromatinRemodeling
Copy link
Author

On WSL, python 3.11.12:

brainevent                0.0.1.post20250422
brainpy                   2.6.0.post20241205
brainstate                0.1.0.post20250423
braintaichi               0.0.4
brainunit                 0.0.8
jax                       0.5.2
jax-cuda12-pjrt           0.5.1
jax-cuda12-plugin         0.5.1
jaxlib                    0.5.1
...

Full list:

WSL.txt

I am using Lenovo Legion R9000X2021R with AMD Ryzen 7 5800H with Radeon Graphics and RTX 3060 here. The WSL version is Ubuntu 24.04.

@Routhleck
Copy link
Member

I notice "No CUDA driver API detected" in the logs.
Have you properly configured the NVIDIA driver and CUDA toolkit in Linux?

Can you successfully run nvidia-smi and nvcc -V and get normal outputs from both commands?

@ChromatinRemodeling
Copy link
Author

I can get normal outputs on my WSL. I cannot get normal outputs on my Ubuntu computer because I haven't configured it. But I suppose the problem isn't caused by GPU-related issues because the bm.set_platform('cpu') at the beginning.

@Routhleck
Copy link
Member

I tested and found that the problem is with JAX version 0.5.3, which seems to be incompatible with BrainTaichi, causing a memory overflow bug. I recommend downgrading JAX to 0.4.38 .

pip install jax==0.4.38

@ChromatinRemodeling
Copy link
Author

I tried to downgrade JAX on WSL and Ubuntu, but to no avail. The bug persists. The "No CUDA driver API detected" log is from Colab, where I use only CPU. I also tried to downgrade JAX on Colab, but other pre-installed packages prevented me from doing so.

I guess there is really a memory overflow. I wonder if there is any technique for me to find out the overflow myself? And maybe you can tell me given jax==0.4.38, which version of brainevent, brainpy, brainstate, braintaichi, and brainunit are you using? I now have trouble finding a working combination.

@ChromatinRemodeling
Copy link
Author

Also, I think I found something interesting. I saw that in brainpy's package metadata, jaxlib[cuda12_pip] is still being specified. But according to the Changelog of JAX, the cuda12_pip extra for jax has been removed in 0.6.0. This may or may not be relevant to the failing transition to jax0.6.0.

I tried to edit the metadata like this:

Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy (>=1.15)
Requires-Dist: tqdm
Provides-Extra: cpu
Requires-Dist: jax (<=0.5.2,>=0.4.13) ; extra == 'cpu'
Requires-Dist: jaxlib (<=0.5.2,>=0.4.13) ; extra == 'cpu'
Requires-Dist: numba ; extra == 'cpu'
Requires-Dist: braintaichi ; extra == 'cpu'
Provides-Extra: cpu_mini
Requires-Dist: jax (<=0.5.2,>=0.4.13) ; extra == 'cpu_mini'
Requires-Dist: jaxlib (<=0.5.2,>=0.4.13) ; extra == 'cpu_mini'
Provides-Extra: cuda12
Requires-Dist: jax[cuda12] (<=0.5.2,>=0.4.13) ; extra == 'cuda12'
Requires-Dist: jaxlib (<=0.5.2,>=0.4.13) ; extra == 'cuda12'
Requires-Dist: numba ; extra == 'cuda12'
Requires-Dist: braintaichi ; extra == 'cuda12'
Provides-Extra: cuda12_mini
Requires-Dist: jax[cuda12] (<=0.5.2,>=0.4.13) ; extra == 'cuda12_mini'
Requires-Dist: jaxlib ; extra == 'cuda12_mini'
Provides-Extra: tpu
Requires-Dist: jaxlib[tpu] ; extra == 'tpu'
Requires-Dist: numba ; extra == 'tpu'

And it removed the warnings in pip install brainpy. Although, the simulation still cannot be run.

Also, I guess the dependency_links.txt in the package can also be removed because it is deprecated.

@Routhleck
Copy link
Member

Sorry for the delayed response. Through GDB debugging, I found that the issue occurs when BrainTaichi loads the kernel module, not during the execution of the kernel itself. We will fix this issue soon and release a new version of BrainTaichi.

In the meantime, I’d like to share a workaround using an older stable version. You can install it by running the following command:

pip install jax[cuda12]==0.4.38
pip install brainpy==2.6.0.post20241025
pip install brainpylib
pip install taichi==1.7.0

Hope this helps.

@ChromatinRemodeling
Copy link
Author

Thanks! The older stable version works smoothly.

@Routhleck
Copy link
Member

Hi, we already release a new BrainTaichi version, you can try it with the latest brainpy.

pip install brainpy -U
pip install jax[cuda12]==0.4.38
pip install taichi==1.7.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants