These days ROCm support is more common than a few years ago so you’re no longer entirely dependent on CUDA for machine learning. (Although I wish fewer tools required non-CUDA users to manually install Torch in their venv because the auto-installer assumes CUDA. At least take a parameter or something if you don’t want to implement autodetection.)
Nvidia’s Linux drivers generally are a bit behind AMD’s; e.g. driver versions before 555 tended not to play well with Wayland.
Also, Nvidia’s drivers tend not to give any meaningful information in case of a problem. There’s typically just an error code for “the driver has crashed”, no matter what reason it crashed for.
Personal anecdote for the last one: I had a wonky 4080 and tracing the problem to the card took months because the log (both on Linux and Windows) didn’t contain error information beyond “something bad happened” and the behavior had dozens of possible causes, ranging from “the 4080 is unstable if you use XMP on some mainboards” over “some BIOS setting might need to be changed” and “sometimes the card doesn’t like a specific CPU/PSU/RAM/mainboard” to “it’s a manufacturing defect”.
Sure, manufacturing defects can happen to anyone; I can’t fault Nvidia for that. But the combination of useless logs and 4000-series cards having so many things they can possibly (but rarely) get hung up on made error diagnosis incredibly painful. I finally just bought a 7900 XTX instead. It’s slower but I like the driver better.
Oh yeah, the equation completely changes for the cloud. I’m only familiar with local usage where you can’t easily scale out of your resource constraints (and into budgetary ones). It’s certainly easier to pivot to a different vendor/ecosystem locally.
By the way, AMD does have one additional edge locally: They tend to put more RAM into consumer GPUs at a comparable price point – for example, the 7900 XTX competes with the 4080 on price but has as much memory as a 4090. In systems with one or few GPUs (like a hobbyist mixed-use machine) those few extra gigabytes can make a real difference. Of course this leads to a trade-off between Nvidia’s superior speed and AMD’s superior capacity.