GPUs Can't Access CUDA Drivers

Incident Report for Grid AI

Resolved

All systems have been updated and the incident is now considered resolved.
Posted Aug 19, 2021 - 03:20 UTC

Monitoring

We made a patch release and the issue has been resolved on production. We'll be monitoring the situation and issue a full resolution tomorrow if no further incidents are identified.
Posted Aug 19, 2021 - 01:17 UTC

Update

We have identified a solution and will be releasing a fix soon.
Posted Aug 18, 2021 - 21:20 UTC

Identified

We have identified an issue with the underlying AMIs. We are working on a solution.
Posted Aug 18, 2021 - 18:53 UTC

Update

We have identified that the issue happens with the mechanism that mounts drivers into Sessions and Experiments. We are investigating the root cause.
Posted Aug 18, 2021 - 18:39 UTC

Investigating

We have identified that CUDA drivers are not available in GPU-enabled environments. We are actively investigating the source of this issue.
Posted Aug 18, 2021 - 16:38 UTC
This incident affected: Grid Service.