Deep Learning's Memory Crisis Solved? Introducing the 'Lean Model' Revolution

1 points by Haeuserschlucht 2 days ago

What we really need is the quality of a Deepseek 671b on a normal computer. Period.

It all needs so much memory because it's just badly programmed, there's a lot of unnecessary stuff in the models that bloats them.

Of course, cutting-edge hardware wants to be sold and if you claim that the big models needed it, then someone will buy them.

Until someone publishes a repository on github that does exactly what I described.

This will be described in a few papers in the near future and then nVidia will be stuck with its overengineered, overpriced machines.