Prelude: Why This Matters

The name FaustARP comes from my least favorite song on Radiohead's In Rainbows. I chose it because it made me realize that even the things we find "ugly" deserve a second look. For a long time, I couldn't understand why that track didn't sit right with me -- until I realized it was the vocals. When I listened to just the instrumental, I found one of the most beautiful arrangements I'd ever heard.

I remember listening to it on a train. When the track ended, I overheard a family nearby. Their youngest daughter said something funny, and only after hearing her parents laugh did she start laughing too. It struck me how naturally she had learned to connect observation with response -- to associate a sound with an emotional state. That small moment made me think about how humans build internal models of the world, connecting what we perceive into something that carries meaning.

That observation became the foundation of my curiosity about how AI systems represent the world. Most AI approaches are very good at pattern matching within a fixed context, but they struggle to carry understanding forward across time. They generate plausible responses without genuinely grounding their knowledge in experience. I became interested in architectures that get closer to forming real cognitive abstractions -- systems that can learn, remember, and reason the way biological minds do.

When I talked about this with my therapist, she pointed out that what drives me isn't a desire to build products, but the need to understand how minds work. FaustARP is my attempt to translate that curiosity into a concrete research program -- one that directly addresses the long-horizon memory limitations I've observed in current AI architectures.

My Work: Long-Term Memory (LTMemory)

The core innovation I've developed over the past year is LTMemory -- a persistent memory mechanism that augments a world model's perception pipeline. The goal is to give the model access to what it has learned across its entire lifetime, not just its recent context window. I began this work on an earlier generative architecture before migrating to a world model foundation that I felt was better suited to the problem of long-horizon reasoning.

The documentation below reflects the baseline LTMemory design and logic. It will be updated as the research progresses through successive versions.

World Model Foundation

Encoder + Posterior
Converts observations into a compressed representation of the current moment.

Recurrent State
A continuously updated context that accumulates patterns across timesteps -- a running summary of recent experience.

Prior
Predicts what comes next without observing it directly -- the model's imagination of future states.

Latent Embedding
A compressed snapshot of what matters in the current moment. This is where LTMemory integrates.

Full State
The complete working representation: the current snapshot combined with accumulated context. Together they answer "what is happening and what led here?"

World Model
The prediction engine: given the full state and an action, it forecasts the next observation, expected reward, and whether the episode continues.

--- config: layout: dagre --- flowchart TB X["Observation"] --> ENC["Encoder"] ENC --> POST["Posterior"] H_PREV["Recurrent State"] --> POST & S["Full State"] & PRIOR["Prior"] POST --> Z["Latent Embedding"] PRIOR -. imagination .-> Z Z --> S S --> GRU["Sequence Model"] & WM["World Model"] & AC["Agent"] A_PREV["Prev Action"] --> GRU GRU --> H_NEXT["Next Recurrent State"] WM --> R["Reward"] & C["Continue"] & RECON["Reconstruction"] AC --> A["Action"] style S fill:#4a5568,stroke:#5e7ce2,stroke-width:2px,color:#fff style Z fill:#5e7ce2,stroke:#fff,stroke-width:2px,color:#fff

The diagram above shows the world model foundation that FaustARP builds on. Before describing the first version, I want to explain how LTMemory slots into this pipeline at a conceptual level.

LTMemory: Conceptual Overview

How the flow works

1. Observe
The system receives an observation (a live frame or a replayed experience) together with prior context.

2. Perceive
The observation is encoded into a feature representation -- a structured description of "what is visible right now."

3. Recall
The current perception is used to query the model's long-term memory store. Relevant past knowledge is retrieved as a memory context signal.

4. Arbitrate
A learned arbitration mechanism weighs the current perception against the retrieved memory. Novel inputs are trusted more; familiar inputs defer to memory. This mirrors how biological attention and surprise interact.

5. Integrate
The arbitrated signal produces a unified representation that carries both immediate context and long-term knowledge into the downstream model.

6. Learn
Two complementary training objectives keep the memory store accurate and prevent forgetting: one anchors it to current experience, the other consolidates past knowledge through replay.

graph TD subgraph Inputs IMG[Observation] PREV_S["Prior Context"] end subgraph "LTMemory Encoder" IMG --> CNN[Perception Module] CNN --> FLAT[Feature Representation] subgraph "Long-Term Memory" MEM["Memory Store"] FLAT --> QUERY[Query] MEM --> KEYS[Keys] MEM --> VALS[Values] QUERY --> SCORES[Relevance Scores] KEYS --> SCORES SCORES --> WEIGHTS[Attention Weights] WEIGHTS --> CTX[Memory Context] VALS --> CTX FLAT --> GATE_IN[Current Signal] CTX --> GATE_MEM[Memory Signal] GATE_IN --> ARBIT{"Arbitration\n(Novelty-Based)"} GATE_MEM --> ARBIT ARBIT --> FUSED[Integrated Representation] end FUSED --> OUT_PROJ[Output] end subgraph "Training" OUT_PROJ --> POST["Posterior"] OUT_PROJ --> PRIOR["Prior"] POST --> Z_LATENT["Latent State"] Z_LATENT --> RECON[Reconstruction] RECON -.->|Active Learning| IMG PRIOR -.->|Memory Consolidation| Z_LATENT end subgraph "Experience Replay" REPLAY[Replay Buffer] -->|Sampled Experience| IMG end style IMG fill:#5e7ce2,stroke:#fff,stroke-width:2px,color:#fff style ARBIT fill:#ff9f43,stroke:#fff,stroke-width:2px,color:#fff style Z_LATENT fill:#5e7ce2,stroke:#fff,stroke-width:2px,color:#fff

How LTMemory Works

LTMemory implements a hybrid memory system designed for long-horizon tasks through a dual-process approach that balances immediate perception with accumulated knowledge.

Efficient Sequential Processing: Observations are handled in structured segments rather than all at once. This keeps computational cost tractable while preserving local coherence across neighboring segments. A working buffer maintains recent context so each segment can attend to what just preceded it.
Persistent Knowledge Anchors: The model maintains a set of global knowledge representations that persist across its entire training lifetime. These act as stable reference points that help the model recognize recurring patterns even when observations are temporally distant from one another.
Dynamic Experience Memory: Alongside the persistent anchors, a dynamic memory module continuously updates its knowledge as the model encounters new data. Unlike the static anchors, this component evolves, storing the distributed signal of accumulated experience.
Novelty-Based Arbitration: A learned arbitration gate controls how much weight to give current perception versus retrieved memory:
- Novel input: The gate defers to the current observation, ensuring the model adapts to genuinely new information.
- Familiar input: The gate defers to memory, allowing the model to reconstruct familiar patterns efficiently without redundant computation.
This is analogous to the role of surprise in biological learning -- novel events demand attention; familiar events can be handled by recall.
Dual-Phase Learning: The memory store is maintained through two internal objectives:
- Active phase: The memory learns to accurately reconstruct the current stream of incoming observations.
- Consolidation phase: To prevent forgetting, the model periodically replays past experiences, reinforcing older knowledge so new learning does not overwrite it. This is structurally analogous to sleep-based memory consolidation in biological systems.

Chronological Research Log

FaustARPv0 -- Initial Hypothesis: The Perception Bottleneck

What I thought the problem was: I believed the world model's limitations stemmed from its relatively simple perception component. It felt like the model was processing each observation in isolation, without any sense of what it had seen before. My hypothesis was that augmenting the perception pipeline with long-term memory access would substantially improve sample efficiency and long-horizon performance.

In hindsight, I moved too fast and without a deep enough understanding of the full model architecture. I was modifying one component while assuming the effects would flow through to the parts I actually cared about -- something that turned out to be only partially true.

My approach: Integrate LTMemory into the world model's perception encoder.

What I tested: I ran FaustARP V0 against a standard world model baseline on CartPole, expecting clear improvements in both convergence speed and final performance.

V0 Results: FaustARP V0 vs. baseline on CartPole. LTMemory improved performance, though not in the way I expected.

What worked: LTMemory did produce measurable benefits -- faster early convergence and higher final returns. FaustARP V0 reached stronger performance with fewer environment steps.

What I missed: The improvements were more modest than predicted. More importantly, I realized I didn't fully understand why the changes were helping -- which made it difficult to know where to go next.

Key insight: Studying the architecture more carefully, I discovered that the encoder primarily influences the learning signal during training rather than the model's imagination capability. This meant that augmenting the "eye" wasn't directly improving the "mind's" ability to reason about the future.

What I learned: The encoder modification acts as a learning accelerator by providing cleaner training targets, but it doesn't reshape how the model builds and uses its internal world representation. The deeper bottleneck was elsewhere.

Retrospective note: The benchmark task was too simple to stress-test the memory system properly. Early results looked modest because we had actually hit the performance ceiling of the environment -- not because the changes weren't working.

I was also carrying a confound from a pre-trained feature extractor in early runs, which partly explained the faster initial gains. Removing it gave a cleaner picture of what LTMemory was actually contributing.

v0.5 -- Revised Hypothesis: The Reasoning Bottleneck

How V0 changed my thinking: Once I controlled for the confound and ran clean benchmarks, I understood the architecture more clearly. The encoder improvement helps during training, but the model's core long-horizon reasoning -- how it builds and maintains context over time -- lives deeper in the architecture. That became my new focus.

Before changing anything, I ran fresh benchmarks to establish a clean baseline and confirmed that LTMemory still produced consistent improvements even without the confound:

In the CartPole evaluation videos, FaustARP V0 visibly holds the pole stable for the first 80 steps despite being only around 25 episodes into training. This early behavioral stability is not observed in the baseline. (4th video below)

Baseline -- Episode 00 -- Standard performance

FaustARP V0 -- Episode 01 -- Memory-augmented encoder

FaustARP V0 -- Episode 02 -- Extended horizon test

FaustARP V0 -- Episode 03 -- Key evidence

Why this matters

In the first 80 steps of this run, FaustARP V0 consistently stabilizes the task despite being only ~25 episodes into training. This early stability is not seen in the baseline and demonstrates faster convergence and more robust early-phase control.

V0 vs V0.5 CartPole comparison — V0 vs. V0.5 -- CartPole comparison after removing the pre-training confound.

Without the pre-trained extractor, V0.5 performance on CartPole aligns more closely with the baseline. This was expected -- and useful -- because it confirmed that CartPole is too simple to reveal the benefits of a long-term memory system. I moved to a more demanding continuous control task where temporal context actually matters.

New experiment: I ran FaustARP V0.5 against the baseline on a continuous locomotion task (Walker2D), comparing behavior across multiple evaluation episodes.

Baseline -- Episode 1

Baseline -- Episode 2

Baseline -- Episode 3

FaustARP V0.5 -- Episode 1

FaustARP V0.5 -- Episode 2

FaustARP V0.5 -- Episode 3

V0 vs V0.5 Walker2D at 100k steps — V0 vs. V0.5 -- Walker2D at 100,000 training steps.

Key outcome: The locomotion experiments confirmed the revised hypothesis. FaustARP V0.5 achieved the training efficiency improvements I originally expected from V0, on a task where long-horizon context is actually load-bearing.

I then introduced a deliberate stress test: both models were reset after 100k steps of training to evaluate resilience and recovery.

V0 vs V0.5 Walker2D recovery test — Recovery test -- continued training after a dataset reset at 100k steps.

The baseline failed to recover from the reset and performance degraded significantly -- likely because it had no mechanism to anchor its representation to what it had learned before the disruption. FaustARP recovered quickly and returned to its prior performance trajectory, which suggests the long-term memory component provides genuine resilience, not just faster early-stage learning.

Key Results and Lessons Learned

Main finding: LTMemory works -- and the reason matters.

My initial hypothesis was partially right: augmenting the perception component did improve performance. But the mechanism was different from what I expected. Understanding why it works -- rather than just that it does -- changed the direction of the research.

What I Got Wrong

Architectural assumptions: I initially overestimated how directly perception improvements would translate to better imagination and planning. The training and inference pathways in a world model are more distinct than I realized.
Benchmark selection: Simple control tasks hit their performance ceiling quickly. Demonstrating the value of long-term memory requires tasks where temporal context is genuinely necessary for success.

Direction Forward

With LTMemory validated at the encoder level, the natural next step is to understand how it behaves when placed deeper in the architecture -- closer to where the model's reasoning and planning actually happen. There are also open questions about the best way to combine multiple memory processes within a single system, and how to build the tooling needed to monitor and interpret model behavior at scale.

Longer term, making the model's internal reasoning legible -- not just to researchers but potentially to the users of systems built on it -- is a direction I find compelling both technically and philosophically.

Open question: How do biological systems maintain coherent representations over very long timescales? Is there a structural principle being used that AI architectures haven't yet captured, or is it largely a question of scale and the right training objectives?

V1: A Second Look

V1 represents a significant step forward from the work above. It is a tested, working implementation that brings the LTMemory concept into a transformer-based architecture, extending the approach beyond the original world model foundation.

In the coming months, the focus will shift to rigorous benchmarking of the expanded system and exploring how the memory components interact across different positions in the architecture.

If you've read this far, thank you. I hope this gives a clear enough picture of the work and the thinking behind it. I wish you the very best.