DualPipe: Bidirectional pipeline parallelism algorithm

180 points by mfiguiere 5 months ago

xnhbx 5 months ago

> DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

A CEO who codes.

anonzzzies 5 months ago

When my company was still working closely with CN factories a few years ago (before the bans / clients no longer wanting to work with companies working with china etc), the CEO's of the factories we worked with all were electronic engineers at that company or another before; they all could jump in, debug schematics, sold and write firmware themselves. And they did. These were places with massive campuses with towering buildings with robots and a few (relative to the massive space) employees doing maintenance etc + prototyping.
- larodi 5 months ago
  
  It sounds so more reasonable to have a director who is actually technical, doesn't it? I'm absolutely amazed how this (to the east) contrasts to understanding (to the west) that directors rather need to know finance, strategic planning, and marketing, than the actual nuance of the work.
  - tway223 5 months ago
    
    To be blunt this is exactly what is wrong with the “leadership” mindset in the west, as decisions are often made without understanding the “nuances” yet they are confident it would work.
tantalor 5 months ago

"developed" and "codes" have different meanings.
- ikeashark 5 months ago
  
  Yes but in this context, they are very close to each other in meaning.
  Besides Liang does indeed code a significant amount and has contributed to almost all of their published papers.

I attached all 3 algorithms 1F1B (1 forward 1 backward), ZB1P (zero bubble pipeline parallelism) and DualPipe as a picture here: https://x.com/danielhanchen/status/1894937006352031832 for those interested :)

Bimos 5 months ago

Maybe add Chimera as well?
https://arxiv.org/pdf/2107.06925
- isoprophlex 5 months ago
  
  it looks as if Chimera has marginally less bubbles than DualPipe?
- danielhanchen 5 months ago
  
  Oh more nice pictures :)
alphan0n 5 months ago

Off topic, but this is the Rick and Morty episode where Rick creates a perfectly level space.
The symmetry is uuugh.
- danielhanchen 5 months ago
  
  You'll have to refresh my memory :) Is there like a Youtube clip for it?
  - Cyphase 5 months ago
    
    https://www.youtube.com/watch?v=-MwCJpEuC44

puppycodes 5 months ago

Sorry for us utter simpletons can someone explain what it do?

fasterergpes 5 months ago

It makes it so that having more GPUs makes inference run faster. Worst case has been you can only use memory from them and gain no speed at all
- 456yetdh6r 5 months ago
  
  [flagged]
qrios 5 months ago

In very simple words: it is one way to reduce the white squares in the picture from @danielhanchen[1].
In more complex words: imagine a processor which is able to process every instruction in 10 clock cycles. But also the processor is able to get new input for this instruction on every clock cycle and starts to process this new input in a pipeline. After the first input you have to wait ten clock cycles. But if you feed the input line every time you will get the output also permanently.
In the case of GPUs, it is now not only a topic of a single pipeline, but multiple in parallel. Depends on your data and algorithm it can be thousands in parallel.
[1] https://x.com/danielhanchen/status/1894937006352031832

optimalplusone 5 months ago

I hope all the open sources Deepseek is doing encourages American labs to do more of the same. Surely they'll realize their momentum is more of a moat than their tech at any one point in time.

jpcom 5 months ago

Does this remind anyone else of the Pied Piper compression algorithm?

aqueueaqueue 5 months ago

Middle out or something?

snake_doc 5 months ago

Hmm weren’t there also supposed to be the SM re-allocation, doesn’t look like it was included; I may have been mis-remembering the explanation.

ringer007 5 months ago

[dead]