// dev journal — philknows.com

PhilKnows

Teaching an AI to play Final Fantasy IV on a real SNES — hardware, vision loops, controller emulation, and everything that broke along the way.

ACTIVE BUILD
WholeEnchilada v10
PLATFORM: SNES (1991 hardware)
GAME: FF4 (US "FF2")
// dev journal — active
This project is being built in iterations. Latest: WE10 — Baron Castle Mapping
Read the Journal

THE ORIGIN

The Question
Can an AI actually play a 1991 JRPG?
Not emulated. Not ROM-hacked. On real physical hardware — a Super Nintendo from 1991, a real cartridge, a real CRT signal. The idea: use Claude's vision capabilities to watch the screen, reason about the game state, and send controller inputs through a hardware emulator. No save states. No cheat codes. No APIs into the game's memory. Just pixels and buttons.
The Constraint
Real Hardware Only
The SNES outputs via composite or RGB. An Elgato capture card digitizes the signal into frames Python can read. An Arduino Micro emulates an actual SNES controller — the game has no idea it's talking to anything other than a first-party Nintendo pad. Every button press is a real electrical signal on real hardware.

THE HARDWARE

Signal Chain
SNES → OSSC → Elgato 4K → PC → Claude
The SNES outputs a native NTSC RGB signal via a Capacitor RGB NTSC cable. That feeds into an OSSC 1.8 (Kaico edition), which line-multiplies the 240p signal into clean HDMI. An Elgato 4K60 capture card ingests the stream into the Windows gaming PC running Python. OpenCV reads frames directly from the capture device — no screen recording, no emulator. Raw pixel-perfect NTSC output, handed to Claude Vision.
Controller Layer
Arduino Micro as SNES Pad
An Arduino Micro wires directly into the SNES controller port. It speaks the SNES latch/clock/data protocol natively. The PC sends button commands over USB serial; the Arduino converts them into timed electrical pulses the console reads as controller input. Early bugs: CH340 driver conflicts on Windows, wire color identification on the SNES connector, and interrupt-based timing to hit the SNES's 16ms polling window reliably.
Debugging Milestones
What Broke, What We Learned
Getting reliable button presses required moving from polling to interrupt-driven timing. The SNES controller protocol is strict — miss the latch window and the console reads garbage. We also had to identify SNES controller wire pinouts by probing with a multimeter since vintage cables have no consistent color standard.
SUPER NINTENDO
1991 · NTSC
CLICK TO LEARN MORE
OSSC 1.8 (KAICO)
Line Multiplier
CLICK TO LEARN MORE
ELGATO 4K60
Capture Card
CLICK TO LEARN MORE
WINDOWS GAMING PC
Python · OpenCV
CLICK TO LEARN MORE
ARDUINO MICRO
Controller Emulator
CLICK TO LEARN MORE
SONY TRINITRON
13" CRT · Display Only
CLICK TO LEARN MORE

SIGNAL CHAIN

Every pixel Claude sees traveled this path before it became a decision.

HDMI→COMP DOWNSCALER HDMI in · RCA out COMPOSITE SONY TRINITRON 13" CRT · KV-13FM12 DISPLAY ONLY SNES 1991 HW 240p NTSC RGB OSSC 1.8 KAICO Line Mult. HDMI ELGATO 4K60 USB→Host USB3 WINDOWS PC GAMING RIG Python / OpenCV WholeEnchilada.py b64 CLAUDE VISION Sonnet JSON ARDUINO MICRO USB Serial CTRL SNES PORT LATCH / CLOCK / DATA → SNES READS AS CONTROLLER INPUT

THE ARCHITECTURE

Vision Loop
Frame Capture → Claude Vision → Action
The core loop runs on the Windows gaming PC: capture a frame from the Elgato via OpenCV, encode it as base64, send it to Claude Vision with a system prompt describing the game and expected JSON output format, parse the response, and fire the corresponding button sequence to the Arduino over USB serial. Target loop time: under 3 seconds per decision cycle.
RAG Layer
Walkthrough Dictionary + HTML Corpus
To give the AI actual game knowledge, we built a RAG system backed by a structured walkthrough dictionary. The AI identifies its current location from the frame, queries the dictionary for what it should do there, and uses that context to inform its action decision. In theory. In practice, this layer has been the hardest to get right.
RAG Corpus
Scraping HTML for Game Knowledge
Rather than hand-crafting all game knowledge, we've been scraping HTML from FF4 walkthrough and wiki sources to build a richer RAG corpus. Parse the pages, chunk the content, embed it — give the model something to actually retrieve against when it needs to know what's in the next room or how a boss mechanic works.
Model Experiments
Llama 13B & 7B — Local Inference Trials
We've been testing local models alongside Claude — specifically Llama 13B and 7B — to evaluate whether a smaller on-device model could handle parts of the decision loop without hitting the API. Results so far: context reasoning degrades noticeably at 7B, and 13B gets closer but still struggles with structured JSON output consistency under game-state complexity.
Safety Systems
Stuck Detection & State Persistence
Cecil has a talent for getting stuck in corners and pressing the same direction forever. We built stuck detection — if the same action fires N times without a screen change, break the loop and reorient. Persistent game state tracks what's been tried, where Cecil has been, and what the current objective is across decision cycles.

VERSIONS

v1 — v3
Proof of concept. Basic frame capture working. Arduino sending button presses. First time Cecil moved on screen under AI control. No game knowledge, pure reaction.
v4 — v5
Structured JSON responses. Moved from freeform AI text to a strict JSON action schema. Cleaner parsing, fewer hallucinated button sequences. Still no walkthrough context.
v6
WholeEnchilada6 — RAG introduced. Walkthrough dictionary added. First attempt at location-aware decision making. Stuck detection added. State persistence across cycles.
// the name: phil likes enchiladas. a lot. "the whole enchilada" also happens to be an idiom for stuffing an absurd amount of logic into one script and hitting run. it felt appropriate. nobody is complaining.
v7 — v8
Stability and timing improvements. Interrupt-based Arduino timing. Loop reliability improvements. Vision prompt tuning. Context window management for long sessions.
v9 — current
WholeEnchilada9. Most stable build to date. Vision loop and RAG functional but Cecil still lacks fundamental game literacy — no internal model of exploration, movement, or interaction basics. This is the rewrite target.
Parallel track
Local model experiments. Testing Llama 13B and 7B as alternatives to API calls. HTML scraping pipeline built to feed a richer RAG corpus from walkthrough and wiki sources. Neither local model matches Claude's structured output reliability yet.

THE STACK

// HARDWARE
Super Nintendo (1991)
The actual computer running the game
Capacitor RGB Cable (NTSC)
Native RGB signal out of SNES MultiAV
OSSC 1.8 (Kaico)
240p line multiplier → clean HDMI
Elgato 4K60
HDMI capture → host device input
Windows Gaming PC
Orchestration host — Python, OpenCV, serial
Arduino Micro
SNES controller emulation over serial
Sony Trinitron 13"
HDMI→composite downscaler · display only
// SOFTWARE
Python
Orchestration, vision loop, RAG, serial comms
OpenCV
Frame reading & preprocessing
HTML Scraper
FF4 walkthrough/wiki corpus for RAG pipeline
Walkthrough Dict (RAG)
Location-aware game knowledge retrieval
// AI / MODELS
Claude Vision (Sonnet)
Primary brain — frame interpretation & action decisions
Llama 13B
Local inference experiment — context reasoning
Llama 7B
Local inference experiment — speed vs. quality
// THE GAME
FF4 / FF2 US (SNES)
The game. Cecil's eternal burden.

WHERE WE ARE NOW

WholeEnchilada v9 is stable. The hardware pipeline is solid — frames are being captured, the Arduino is reliable, serial comms don't drop. The problem is upstairs.

Cecil doesn't move. The vision loop fires, the RAG queries, the JSON comes back — but Cecil has no internal model of what it means to explore a room. He doesn't know that rooms have exits, that you walk in directions to find them, that you press A to interact. The AI knows the walkthrough but doesn't know how to be a player.

Next up: a ground-up rewrite of the vision prompt and action schema with explicit game-literacy reasoning baked in. Teaching Cecil the basics before asking him to clear the Mist Cave.

Hardware ✓Serial Comms ✓Vision Loop ✓ RAG — Needs RewriteGame Literacy — Missing Llama 7B/13B — Evaluating HTML Corpus — In Progressv10 — In Progress

WHERE WE ARE GOING

Mist Dragon — Final Fantasy IV SNES