Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to CUDA 12.5, TensorFlow 2.18.0, Keras 3.8.0, and Nvidia driver 550.54.15 #5061

Open
metrizable opened this issue Jan 27, 2025 · 3 comments
Assignees
Labels
announce For announcements for upcoming VM image updates feature

Comments

@metrizable
Copy link
Contributor

metrizable commented Jan 27, 2025

Upgrade to CUDA 12.5, TensorFlow 2.18.0, Keras 3.8.0, and Nvidia driver 550.54.15

The version of CUDA, TensorFlow, Keras, and the related ecosystem of packages and libraries pre-installed in the Colab runtime has been upgraded. TensorFlow 2.18.0 is the latest version released by the TensorFlow team, and this upgrade brings in several features, bug fixes, optimizations, and changes. Find out what's new in TensorFlow 2.18 on the TensorFlow Blog article!

The upgrade to CUDA 12.5, TensorFlow 2.18.0, Keras 3.8.0 and related packages and libraries keeps the versions pre-installed in Colab up-to-date with the current scientific computing ecosystem.

Colab’s fallback runtime version

Using the fallback runtime version temporarily allows access to the last version of the runtime before the upgrade described above, and is available until mid-February. Its purpose is to provide a temporary mechanism for users to more smoothly upgrade their notebooks to be compatible with Colab’s current runtime version. This will be available from the Command Palette via the "Use fallback runtime version" command when connected to a runtime. Of note, this setting does not persist across sessions – the command will need to be invoked on each new session.

@metrizable metrizable added announce For announcements for upcoming VM image updates feature labels Jan 27, 2025
@metrizable metrizable pinned this issue Jan 27, 2025
@metrizable metrizable self-assigned this Jan 28, 2025
@krish07751
Copy link

Hi, Your upgrade has completely decimated our working system. How do I fall back on CUDA to 12.4 ? This CUDA 12.5 is weird. I am about to lose my important client because of this :(
Please have your upgrade tested before you big bang it on paying customers environments!
...
from /content/openpose/3rdparty/caffe/src/caffe/layers/cudnn_softmax_layer.cpp:4:
/usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static typename AgentT::TempStorage& cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl::get_temp_storage(cub::CUB_200400___CUDA_ARCH_LIST___NS::NullType&, cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_t&)’:
/usr/local/cuda/include/cub/util_device.cuh:160:63: error: ‘blockIdx’ was not declared in this scope
160 | static_cast<char*>(vsmem.gmem_ptr) + (vsmem_per_block * blockIdx.x));
| ^~~~~~~~
/usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static bool cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl::discard_temp_storage(typename AgentT::TempStorage&)’:
/usr/local/cuda/include/cub/util_device.cuh:201:38: error: ‘threadIdx’ was not declared in this scope
201 | const std::size_t linear_tid = threadIdx.x;
| ^~~~~~~~~
/usr/local/cuda/include/cub/util_device.cuh:202:50: error: ‘blockDim’ was not declared in this scope
202 | const std::size_t block_stride = line_size * blockDim.x;

....

@Mahsa-M-90
Copy link

Dear Metrizable,
Since last Thursday, I have not been able to run my model using GPU RAM, due to the upgrade of Tensorflow in colab. Despite the use of Colab pro plus plan, the model can only be run on System RAM, which makes it impossible to continue working, as it is toooo slow. I would be very thankful if you could please guide me how I can solve this problem.
I had never had such a problem before last week, and I was always able to run the same model using 30gb of GPU RAM, with A100 runtime accelerator. However, now colab only allocates 1 GB GPU RAM for the same model!!!!!!
Thank you for your consideration.

@StevenGee3398
Copy link

I suddenly get the error
`CUDA status Error: file: ./src/convolutional_kernels.cu: func: cuda_convert_f32_to_f16() line: 138

CUDA Error: the provided PTX was compiled with an unsupported toolchain.
Darknet error location: ./src/convolutional_kernels.cu, cuda_convert_f32_to_f16(), line #138
CUDA Error: the provided PTX was compiled with an unsupported toolchain.: Success
backtrace (12 entries)`

when running my yolov4 darknet training code using cuda that worked just fine before. Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
announce For announcements for upcoming VM image updates feature
Projects
None yet
Development

No branches or pull requests

5 participants
@krish07751 @StevenGee3398 @metrizable @Mahsa-M-90 and others