When repeatedly running "generate story about X" on different models and then simply asking for next part, one thing that stands out is many LLMs will gladly swap characters in their output. Like X asks Y to do something, Y does, then Y says "thank you X for doing this". But obviously it is much more varied.
Most likely because there is no mechanism in this thing that would allow for building spatial or relationship model between entities.
I once asked it to emulate being air traffic control so I could practice for a pilot exam. It generated a full transcript of a pilot character called "you" talking to air traffic control...
Of course they can do it, if they are trained with a large number of pairs of data consisting of various texts, and annotations of who does what in that text. Then they will predict correct tokens that talk about who did what.
LLMs are pretty good at preserving who did what when they translate from one language to another. That's because translation examples they are trained on correctly preserve who did what.
> This study asked whether Large Language Models (LLMs) understand sentences in the minimal sense of representing “who did what to whom”. In Experiment 1, we found that the overall geometry of LLM distributed activity patterns failed to capture this information: similaritiesbetween sentences reflected whether they shared syntax more than whether they shared thematic role assignments. Human judgments, in contrast, were strongly driven by this aspect of meaning.
> In Experiment 2, we found limited evidence that thematic role information was available even in a subset of hidden units. Whereas activity patterns in subsets of hidden units often allowed for significant classification of whether sentence pairs had shared vs. opposite thematic role assignments, the effect sizes were small; even the best-performing case appeared to lag behind humans, and its representation of thematic roles did not seem robust across syntactic structures.
> However, thematic role information was reliably available in a large number of attention heads, demonstrating LLMs have the capacity to extract thematic role information. In some cases, information present in attention heads descriptively exceeded human performance.
When repeatedly running "generate story about X" on different models and then simply asking for next part, one thing that stands out is many LLMs will gladly swap characters in their output. Like X asks Y to do something, Y does, then Y says "thank you X for doing this". But obviously it is much more varied.
Most likely because there is no mechanism in this thing that would allow for building spatial or relationship model between entities.
I once asked it to emulate being air traffic control so I could practice for a pilot exam. It generated a full transcript of a pilot character called "you" talking to air traffic control...
op: https://arxiv.org/abs/2504.16884
Of course they can do it, if they are trained with a large number of pairs of data consisting of various texts, and annotations of who does what in that text. Then they will predict correct tokens that talk about who did what.
LLMs are pretty good at preserving who did what when they translate from one language to another. That's because translation examples they are trained on correctly preserve who did what.
Maybe read the paper first?
> This study asked whether Large Language Models (LLMs) understand sentences in the minimal sense of representing “who did what to whom”. In Experiment 1, we found that the overall geometry of LLM distributed activity patterns failed to capture this information: similaritiesbetween sentences reflected whether they shared syntax more than whether they shared thematic role assignments. Human judgments, in contrast, were strongly driven by this aspect of meaning.
> In Experiment 2, we found limited evidence that thematic role information was available even in a subset of hidden units. Whereas activity patterns in subsets of hidden units often allowed for significant classification of whether sentence pairs had shared vs. opposite thematic role assignments, the effect sizes were small; even the best-performing case appeared to lag behind humans, and its representation of thematic roles did not seem robust across syntactic structures.
> However, thematic role information was reliably available in a large number of attention heads, demonstrating LLMs have the capacity to extract thematic role information. In some cases, information present in attention heads descriptively exceeded human performance.