ce46e29e38
Two bugs from the initial run:
1. workspace_files format is [{source, dest}] not {path, content} —
files live in PinchBench's assets/ directory, not tasks/. Now checks
both tasks/ and assets/ directories.
2. LLM judge tasks (writing, research) scored 0% because the judge
wasn't implemented. Now uses codewhale exec as the judge — sends
the rubric + workspace contents and parses a JSON score response.
Also strips ANSI escape codes and control characters from judge output
to prevent JSON parse failures.