Meadow Mind completes reaction tasks at a 0.4-second decision speed, with no RL training and no reward.
You only prepare the perception and the rule; the decision core is packaged. import meadow_mind and go.
Status: DROPPING. Iron rule: always include a velocity or trend term.C → fire main engineFive steps to wire a task into Meadow Mind and run it with zero training.
angle 0.13, spin 0.9 into
"the pole tilts right, spinning fast". Buckets are enough: small/big, fast/slow.Task(memory=True);
the task is about maintaining a state (balance, landing, tracking) → keep it off,
repetition is the job. The runner also hints when it detects looping.mind.check(task) asks item by item; CartPole passed 7 of 8
(one miss allowed), landing 5/5, maze 7/7, MountainCar 3/3.
A failed exam needs no training: the perception sentence is usually incomplete;
rephrase and re-check.You are a Meadow Mind task integration engineer. Given a game's observation and action description, produce: 1) perceive(obs): translate numeric observations into one English situation sentence. Bucket values (small/big, fast/slow), always include a velocity or trend term, arbitrate multi-objective states into ONE uppercase Status keyword. 2) rule: one English sentence, a one-layer mapping from status keywords to option letters (no nesting). 3) options: multiple choice (A=..., B=...) mapped to env actions. No free-form. 4) Decide on memory: "is revisiting the same state a failure signal?" Yes (maze/exploration/dead ends) → Task(memory=True), write perceive(obs, task), use task.seen(key) to annotate (safe, already visited), and add "prefer unvisited directions" to the rule. Regulation tasks (balance/landing): keep it OFF — annotations measurably hurt. Unsure → off; the runner hints on loops. 5) sanity: enumerate every situation with its expected letter (the exam; include annotated situations if memory is on). Then run mind.check(task), one miss allowed; on failure only rephrase, never touch the model.
Every frame in every video corresponds to one real model decision. No scripted policy, no edited speed-ups.
def perceive(obs): th, thv = obs[2], obs[3] tilt = "right" if th > 0 else "left" spin = "right" if thv > 0 else "left" speed = "fast" if abs(thv) > abs(th) else "slow" return f"The pole tilts {tilt}. The spin is {spin}, {speed} spin."
if leg1 or leg2: if vy < -0.1: return "Status: DROPPING." # touched but sinking: keep cushioning return "Status: LANDED."
rule = ("Rule: push in the same direction the car is moving, " "to pump energy like a swing. If not moving, push left.")
Without memory, a dead end means pacing at its mouth forever.
Add a memory cue to the perception sentence, and it backs out and routes around.
Task(memory=True).
Visited states accumulate automatically; annotate the perception sentence with
task.seen(). No model changes, no training.
Off by default; the runner hints when it detects looping.
st = "safe" if cell in visited: st = "safe, already visited"
| RL (PPO / SAC) | Meadow Mind | |
|---|---|---|
| policy source | reward engineering + training (hours to days) | one sentence (seconds) |
| samples | 10⁵ to 10⁷ env steps | 0 |
| changing behavior | retrain | edit words |
| interpretability | black-box weights | rule and every decision are readable |
| decision latency | 0.1 to 1 ms | ~0.4 s (honest weakness) |
| continuous precision, high-rate control | strong | discrete multiple choice (honest weakness) |
RL needs reward because the policy hides inside weights and can only be carved by a scalar signal. Meadow Mind's policy is a readable sentence: env scores are just report cards and never enter the decision loop. Reward is replaced by outcome feedback: the episode trace points at the wrong sentence, and you edit it. LunarLander went from crash to landing with one ten-second cushioning line.
Honest limits: the reaction floor is ~0.4s (~2Hz); tighter deadlines (a 1-meter pole, Pong trajectory prediction) are out of reach today; the perceiver is human-designed, a teaching division of labour. Next version: layered perception that acts as soon as confidence crosses a threshold, targeting ~0.15s.