№ 09
Auto-Generative Self Corpus AGSC
noun phrase · AI · self-quantification · machine learning · personal data
Definition
A longitudinal, multimodal dataset generated continuously and automatically as a byproduct of an individual's cognitive, physiological, and behavioral activity — requiring no manual curation — that serves as the ground truth training signal for a personalized AI model of that individual.
Problem
Building a model of a person requires data about that person. Existing approaches either rely on manual journaling and tagging (high effort, low completeness) or synthetic model-generated data (low fidelity, circular). Neither produces a ground truth signal rich enough to model the full complexity of an individual.
Insight
The AGSC reframes the problem: a person's LLM conversations, biometric streams, behavioral logs, and movement patterns are already a continuous, high-resolution record of who they are. The corpus doesn't need to be built — it needs to be recognized and collected. The act of living generates it.
Origin
Emerged from the observation that the most accurate model of a person is not one trained on what they say about themselves, but on what they actually do — and that digital life now produces exactly that signal, passively and continuously.
The Auto-Generative Self Corpus reframes what it means to build a model of a person. Rather than treating data collection as a deliberate act — journaling, tagging, curating — the AGSC treats it as a passive consequence of living in a digitally instrumented world.
Distinguished from synthetic corpus generation (model-produced data) and manual second-brain systems (human-curated data) by its passive accumulation architecture: the corpus grows by living, not by labeling.