Files
codewhale/crates/tui
Hunter Bown f94bedf1ea feat(tools): swap read_file PDF path to bundled pdf-extract; pdftotext now opt-in
Before v0.8.32 the PDF branch in `read_file` shelled out to `pdftotext`
(Poppler), which meant first-time users on hosts without it saw the
tool return a `binary_unavailable` sentinel and had to install Poppler
before the model could open a PDF. The `pdf-extract` crate already
powered URL-fetched PDFs in `web_run`, so this change reuses it for
the local PDF path too — extraction is now zero-dependency on every
supported host. The `pages` parameter still filters by 1-indexed
inclusive page range; both whole-file and per-page variants run with
no system dependency.

Users with column-heavy or complex-table PDFs (academic papers,
financial filings) where `pdftotext -layout` still wins can opt into
the external path with `prefer_external_pdftotext = true` in
`~/.config/deepseek/settings.toml`. When set, the previous Poppler
dispatch returns — including the `binary_unavailable` install hint
when the binary is missing on PATH. `deepseek doctor` reframes
`pdftotext` as optional and explains the opt-in instead of treating
it as a missing dependency.

Tests round-trip a checked-in academic PDF through the new path and
verify the `pages` window still slices correctly. The pre-v0.8.32
"binary_unavailable on missing pdftotext" branch is exercised against
the new opt-in setting via a serialised env-var guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 00:21:24 -05:00
..
2026-05-11 23:07:02 -05:00