winrid 18 hours ago

Wonder what hardware they are using and what the actual iops numbers are

  • tucnak 14 hours ago

    I wouldn't be surprised if it's some stock AWS instance type, probably not even NVMe-enabled, or some kind of metal instance bottlenecked by EBS. The downside of Cloud push in SaaS is that they were unable to tap into the ever-changing storage landscape, NVMe-oF, and so on. This is mostly gun-in-foot matter more than anything.

    • jitl 10 hours ago

      I work at Notion.

      It’s stock RDS on EBS, which as you say is super slow compared to Genuine Local Storage. Any large scan or large result set query that doesn’t fall into the Postgres cache or the underlying EBS cache is frustratingly slow.

      However almost all of our reads are “get row by ID” lookups that hit memcached. Our query pattern is mostly recursive graph traversal which Postgres is bad at, with our table size even a 20x faster disk wouldn’t make it feasible to do in SQL. We also copy paste “notion database” data in to a stateful caching service, but that also needs specialized indexing for user defined schema, another thing Postgres can’t handle (my last project was an improvement to that service).

      The place we really suffer from the EBS slowness is with full table scan data migrations, stuff like “for all blocks in Notion, block = f(block)”. For this we use “Big Data” approach reading from DB dumps in S3 instead of Postgres directly.

      Ultimately I’d prefer to serve every read from your device’s local storage, which is my current project.

      • choilive 8 hours ago

        > Ultimately I’d prefer to serve every read from your device’s local storage, which is my current project.

        Is that for native desktop/mobile clients only or web as well?

        • jitl 7 hours ago

          All platforms; we already cache and serve locally using SQLite (see https://www.notion.com/blog/how-we-sped-up-notion-in-the-bro...) but the cache is built as an optimization -- loads look like `isFastDevice ? (await loadLocal(request) ?? await loadRemote(request)) : (race(loadLocal(request), loadRemote(request))`

          We're working on redesigning the data architecture to be more truly "local first" where every read should come from local, and falling back to the server is a rare exception. We want this to look like `await loadLocal(request) ?? await sync({ priority: request }).then(() => loadLocal(request))`

          (Note: for features like content that make sense offline. Some features, like user invitation and management, only make sense online)

          • choilive 7 hours ago

            Oh nice. I had dismissed offline/local web apps for anything decently complex for a long time because of the file system limitations. But it looks like this will finally make that practical with OPFS & WASM SQlite. Thanks for sharing.

      • tucnak 7 hours ago

        Much appreciated, & good luck with your project!

      • winrid 6 hours ago

        Yeah I figured it was EBS :) the default max iops is like 16k... an i3en can usually destroy 10 default-config EBS backed instances :P

        Thanks for the info and GL with your project!