Files
codewhale/crates
Ziang Xie 950860768f fix(web_search): complete HTML entity decoding with numeric character references (#822)
* fix(web_search): complete HTML entity decoding with numeric character references

The decode_html_entities function only handled 7 named entities and lacked
support for decimal (&#NN;) and hex (&#xHH;) numeric character references,
which are common in search engine result snippets. This caused garbled text.

Replace the hard-coded replace chains with a regex-based decoder that
handles:
- All named entities (amp, lt, gt, quot, apos, nbsp, copy, reg, mdash,
  ndash, lsquo, rsquo, ldquo, rdquo, hellip)
- Decimal numeric references (A → A)
- Hex numeric references (A → A)
- Unknown entities are passed through unchanged

* style(web_search): apply rustfmt

---------

Co-authored-by: Hunter Bown <hmbown@gmail.com>
2026-05-06 02:58:20 -05:00
..