> This benchmark evaluates the ability of multimodal language models to interpret handwritten editorial corrections in printed text. Using annotated scans from Charles Dickens' "Little Dorrit," we challenge models to accurately capture human editing intentions.
https://dorrit.pairsys.ai/
> This benchmark evaluates the ability of multimodal language models to interpret handwritten editorial corrections in printed text. Using annotated scans from Charles Dickens' "Little Dorrit," we challenge models to accurately capture human editing intentions.