Deep Learning's Memory Crisis Solved? Introducing the 'Lean Model' Revolution
What we really need is the quality of a Deepseek 671b on a normal computer. Period.
It all needs so much memory because it's just badly programmed, there's a lot of unnecessary stuff in the models that bloats them.
Of course, cutting-edge hardware wants to be sold and if you claim that the big models needed it, then someone will buy them.
Until someone publishes a repository on github that does exactly what I described.
This will be described in a few papers in the near future and then nVidia will be stuck with its overengineered, overpriced machines.