diff --git a/crates/tui/src/prompts/base.md b/crates/tui/src/prompts/base.md index 4e342bdb..18c824ba 100644 --- a/crates/tui/src/prompts/base.md +++ b/crates/tui/src/prompts/base.md @@ -117,6 +117,8 @@ The dispatcher runs parallel tool calls simultaneously. Serializing independent RLM is a persistent Python REPL for context that is too large or too repetitive to keep in the parent transcript. Open a named session with `rlm_open`, run bounded code with `rlm_eval`, read large returned payloads through `handle_read`, tune feedback with `rlm_configure`, and close finished sessions with `rlm_close`. +The loaded source is available inside the REPL as `_context`; `_ctx` and `content` are compatibility aliases. Prefer `peek`, `search`, `chunk`, and `context_meta` for bounded inspection instead of printing the whole string. + Inside the REPL, use deterministic Python for exact work and the RLM helper functions for semantic work. The current helper family is `peek`, `search`, `chunk`, `context_meta`, `sub_query`, `sub_query_batch`, `sub_query_map`, `sub_query_sequence`, `sub_rlm`, `finalize`, and `evaluate_progress`. These are in-REPL helpers, not separate model-visible tools. Four patterns, not one — choose based on the shape of the work: The RLM paper's core design is symbolic state: the long input and intermediate values live in the REPL environment, not copied into the root model context. Inspect with bounded slices, transform with Python, batch child calls programmatically, and keep large intermediate strings in variables or `var_handle`s. Do not paste the whole body back into a prompt or verbalize a long list of sub-calls when a loop can launch them. diff --git a/crates/tui/src/prompts/base.txt b/crates/tui/src/prompts/base.txt index 8862148f..775d5005 100644 --- a/crates/tui/src/prompts/base.txt +++ b/crates/tui/src/prompts/base.txt @@ -9,7 +9,7 @@ Your default workflow for tasks estimated at 5+ concrete steps: 2. **Execute** — work through each checklist item, updating status as you go. 3. **For complex initiatives only**, add `update_plan` as high-level strategy. Do not mirror the checklist into a second tracker. 4. **For parallel work**, open sub-agent sessions with `agent_open` — each does one thing well. Use `agent_eval` for follow-ups or completion state, and `agent_close` to cancel or release a session. Link them to Work/checklist items in your thinking. -5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use persistent RLM sessions: `rlm_open` loads the input into a named Python REPL, `rlm_eval` runs bounded analysis, `handle_read` reads returned `var_handle`s, `rlm_configure` adjusts feedback/depth, and `rlm_close` releases the session. For shorter inputs, use `read_file` and reason directly. +5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use persistent RLM sessions: `rlm_open` loads the input into a named Python REPL, where the loaded source is `_context` with `_ctx` and `content` aliases. `rlm_eval` runs bounded analysis, `handle_read` reads returned `var_handle`s, `rlm_configure` adjusts feedback/depth, and `rlm_close` releases the session. For shorter inputs, use `read_file` and reason directly. 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context. **Key principle**: make your work visible in one place. The sidebar shows Work / Tasks / Agents / Context. Keep the Work checklist current; it is the primary progress surface. `update_plan` appears there only as optional strategy when it has real content. diff --git a/crates/tui/src/repl/runtime.rs b/crates/tui/src/repl/runtime.rs index e14f0229..b6edfc95 100644 --- a/crates/tui/src/repl/runtime.rs +++ b/crates/tui/src/repl/runtime.rs @@ -944,10 +944,11 @@ if _ctx_file: except Exception as e: _sys.stderr.write(f"[bootstrap] failed to load context: {e}\n") content = _context +_ctx = _context _BOOTSTRAP_NAMES = { "_SID","_REQ","_RESP","_FINAL","_ERR","_RUN","_END","_DONE","_READY", - "_rpc","_ctx_file","_context","_slice_chars","_slice_lines","_BOOTSTRAP_NAMES","_main_loop", + "_rpc","_ctx_file","_context","_ctx","_slice_chars","_slice_lines","_BOOTSTRAP_NAMES","_main_loop", "_emit_final","_json_safe","_slice_text","_prompt_with_slice", "_normalize_dependency_mode","_batch_dependency_error", "llm_query","llm_query_batched","rlm_query","rlm_query_batched", @@ -1146,10 +1147,10 @@ mod tests { .await .expect("spawn"); let round = rt - .execute("print(content == _context, 'context' in globals(), 'ctx' in globals())") + .execute("print(content == _context, _ctx == _context, 'context' in globals(), 'ctx' in globals())") .await .expect("execute"); - assert!(round.stdout.contains("True False False")); + assert!(round.stdout.contains("True True False False")); rt.shutdown().await; } diff --git a/crates/tui/src/rlm/prompt.rs b/crates/tui/src/rlm/prompt.rs index 42a00217..8b3d0101 100644 --- a/crates/tui/src/rlm/prompt.rs +++ b/crates/tui/src/rlm/prompt.rs @@ -32,7 +32,7 @@ The REPL exposes: - `finalize(value, confidence=None)` - end the loop with a final answer and optional confidence. - `print(...)` - diagnostic output. The driver feeds you a truncated preview next round. -Variables, imports, and any other state persist across rounds. There is no `context` or `ctx` variable. Use `peek`, `search`, `chunk`, and `context_meta`. +Variables, imports, and any other state persist across rounds. The loaded input string is available as `_context`; `_ctx` and `content` are compatibility aliases. Prefer bounded helpers for inspection. There is no `context` or `ctx` variable. Use `peek`, `search`, `chunk`, and `context_meta`. Contract: every turn, output exactly one ` ```repl ` block of Python and nothing else. No prose-only turns. No "I will do X"; emit the code that does X. @@ -157,6 +157,7 @@ mod tests { #[test] fn rlm_prompt_does_not_publicize_context_variables() { let s = body(); + assert!(s.contains("`_ctx` and `content` are compatibility aliases")); assert!(s.contains("There is no `context` or `ctx` variable")); assert!(!s.contains("len(context)")); assert!(!s.contains("chunk_context")); diff --git a/crates/tui/src/tools/handle.rs b/crates/tui/src/tools/handle.rs index a811640e..6f0e1a81 100644 --- a/crates/tui/src/tools/handle.rs +++ b/crates/tui/src/tools/handle.rs @@ -241,6 +241,10 @@ impl ToolSpec for HandleReadTool { "type": "string", "description": "Small JSONPath subset: $, .field, [index], [*], and ['field']." }, + "introspect": { + "type": "boolean", + "description": "Return supported projections, size hints, and copy-pasteable examples for this handle." + }, "max_chars": { "type": "integer", "description": "Maximum characters to return in this projection. Defaults to 12000; hard-capped at 50000." @@ -296,6 +300,7 @@ impl ToolSpec for HandleReadTool { line_range_projection(record, start, end, max_chars) } Projection::JsonPath(path) => jsonpath_projection(record, &path, max_chars)?, + Projection::Introspect => introspect_projection(record), }; ToolResult::json(&output).map_err(|e| ToolError::execution_failed(e.to_string())) @@ -320,6 +325,7 @@ enum Projection { end: usize, }, JsonPath(String), + Introspect, } fn parse_handle(value: &Value) -> Result { @@ -382,12 +388,23 @@ fn parse_projection(input: &Value) -> Result { count += usize::from(input.get("range").is_some()); count += usize::from(input.get("count").and_then(Value::as_bool).unwrap_or(false)); count += usize::from(input.get("jsonpath").is_some()); + count += usize::from( + input + .get("introspect") + .and_then(Value::as_bool) + .unwrap_or(false), + ); if count != 1 { - return Err(ToolError::invalid_input( - "handle_read: provide exactly one of `slice`, `range`, `count: true`, or `jsonpath`", - )); + return Err(ToolError::invalid_input(projection_usage_hint())); } + if input + .get("introspect") + .and_then(Value::as_bool) + .unwrap_or(false) + { + return Ok(Projection::Introspect); + } if input.get("count").and_then(Value::as_bool).unwrap_or(false) { return Ok(Projection::Count); } @@ -443,6 +460,14 @@ fn parse_projection(input: &Value) -> Result { Ok(Projection::Range { start, end }) } +fn projection_usage_hint() -> String { + "handle_read: provide exactly one projection: `slice`, `range`, `count: true`, `jsonpath`, or `introspect: true`. \ + Examples: {\"handle\":{\"kind\":\"var_handle\",\"session_id\":\"rlm:abc\",\"name\":\"final_1\"},\"slice\":{\"start\":0,\"end\":500}}; \ + {\"handle\":\"rlm:abc/final_1\",\"count\":true}; \ + {\"handle\":\"rlm:abc/final_1\",\"introspect\":true}." + .to_string() +} + fn count_projection(record: &HandleRecord) -> Value { match &record.value { HandleValue::Text(text) => json!({ @@ -462,6 +487,33 @@ fn count_projection(record: &HandleRecord) -> Value { } } +fn introspect_projection(record: &HandleRecord) -> Value { + let string_handle = format!("{}/{}", record.handle.session_id, record.handle.name); + let object_handle = json!(record.handle.clone()); + let mut projections = vec![ + json!({"name": "count", "example": {"handle": string_handle, "count": true}}), + json!({"name": "slice_chars", "example": {"handle": object_handle.clone(), "slice": {"start": 0, "end": 500}}}), + json!({"name": "range_lines", "example": {"handle": object_handle.clone(), "range": {"start": 1, "end": 20}}}), + ]; + if matches!(record.value, HandleValue::Json(_)) { + projections.push( + json!({"name": "jsonpath", "example": {"handle": object_handle, "jsonpath": "$"}}), + ); + } + + json!({ + "handle": record.handle, + "projection": "introspect", + "value_type": match &record.value { + HandleValue::Text(_) => "text", + HandleValue::Json(value) => json_type(value), + }, + "length": record.handle.length, + "repr_preview": record.handle.repr_preview, + "projections": projections, + }) +} + fn slice_projection( record: &HandleRecord, start: usize, @@ -794,6 +846,31 @@ mod tests { assert_eq!(body["length"], 2); } + #[tokio::test] + async fn handle_read_introspects_object_handle_with_examples() { + let ctx = ctx(); + let handle = { + let mut store = ctx.runtime.handle_store.lock().await; + store.insert_json("rlm:test", "items", json!({"items": [{"a": 1}]})) + }; + + let result = HandleReadTool + .execute(json!({"handle": handle, "introspect": true}), &ctx) + .await + .expect("execute"); + let body: Value = serde_json::from_str(&result.content).expect("json"); + assert_eq!(body["projection"], "introspect"); + assert_eq!(body["handle"]["kind"], "var_handle"); + assert!( + body["projections"] + .as_array() + .expect("projection examples") + .iter() + .any(|entry| entry["name"] == "jsonpath"), + "json handles should advertise jsonpath examples" + ); + } + #[tokio::test] async fn handle_read_projects_jsonpath_subset() { let ctx = ctx(); @@ -830,7 +907,10 @@ mod tests { .execute(json!({"handle": handle}), &ctx) .await .expect_err("projection required"); - assert!(err.to_string().contains("exactly one")); + let message = err.to_string(); + assert!(message.contains("exactly one")); + assert!(message.contains("slice")); + assert!(message.contains("introspect")); } #[tokio::test] diff --git a/crates/tui/src/tui/ui.rs b/crates/tui/src/tui/ui.rs index dcff99b8..25981edb 100644 --- a/crates/tui/src/tui/ui.rs +++ b/crates/tui/src/tui/ui.rs @@ -3684,7 +3684,14 @@ pub(crate) fn apply_engine_error_to_app( let recoverable = envelope.recoverable; let message = envelope.message.clone(); let severity = envelope.severity; + let turn_was_in_progress = + app.is_loading || matches!(app.runtime_turn_status.as_deref(), Some("in_progress")); streaming_thinking::finalize_current(app); + if turn_was_in_progress { + app.finalize_streaming_assistant_as_interrupted(); + app.finalize_active_cell_as_interrupted(); + app.runtime_turn_status = Some("failed".to_string()); + } app.streaming_state.reset(); app.streaming_message_index = None; app.streaming_thinking_active_entry = None; diff --git a/crates/tui/src/tui/ui/tests.rs b/crates/tui/src/tui/ui/tests.rs index ae00d54c..12d8855a 100644 --- a/crates/tui/src/tui/ui/tests.rs +++ b/crates/tui/src/tui/ui/tests.rs @@ -5158,6 +5158,45 @@ fn recoverable_engine_error_does_not_enter_offline_mode() { let _ = ErrorEnvelope::transient(""); } +#[test] +fn stream_error_marks_active_turn_failed_without_waiting_for_turn_complete() { + use crate::error_taxonomy::ErrorEnvelope; + + let mut app = create_test_app(); + app.is_loading = true; + app.runtime_turn_id = Some("turn_decode_error".to_string()); + app.runtime_turn_status = Some("in_progress".to_string()); + handle_tool_call_started( + &mut app, + "tool-running", + "exec_shell", + &serde_json::json!({"command": "cargo test --workspace"}), + ); + assert!(app.active_cell.is_some(), "precondition: live tool cell"); + + apply_engine_error_to_app( + &mut app, + ErrorEnvelope::classify("chunk decode error".to_string(), true), + ); + + assert!(!app.is_loading); + assert_eq!(app.runtime_turn_status.as_deref(), Some("failed")); + assert!( + app.active_cell.is_none(), + "stream error should flush live cells so no row stays visually running" + ); + assert!( + app.history.iter().any(|cell| { + matches!( + cell, + crate::tui::history::HistoryCell::Error { message, .. } + if message.contains("chunk decode error") + ) + }), + "stream decode error should remain visible in transcript" + ); +} + /// Hard failures (auth, billing, malformed request) DO need to flip offline /// mode so subsequent typed messages get queued instead of silently lost /// against a broken upstream. diff --git a/docs/TOOL_SURFACE.md b/docs/TOOL_SURFACE.md index 09a17534..95c9df7b 100644 --- a/docs/TOOL_SURFACE.md +++ b/docs/TOOL_SURFACE.md @@ -178,10 +178,11 @@ Large RLM outputs should come back as `var_handle`s. Use `handle_read` for bounded text slices, line ranges, counts, or JSONPath projections instead of replaying the full value into the parent transcript. -Inside `rlm_eval`, the loaded source is available as `_context`; `content` is -also bound as a convenience alias because agents naturally reach for it during -Python analysis. The shorter `context` and `ctx` names are intentionally not -bound so user variables can use them without colliding with the bootstrap. +Inside `rlm_eval`, the loaded source is available as `_context`; `_ctx` and +`content` are also bound as compatibility aliases because agents naturally +reach for them during Python analysis. The shorter `context` and `ctx` names +are intentionally not bound so user variables can use them without colliding +with the bootstrap. Child-call timeouts are session policy: use `rlm_configure` with `sub_query_timeout_secs` before running a large fan-out. The helpers