• bionicjoey@lemmy.ca
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    The issue with OCR’ing pdfs is typically that it doesn’t understand the document formatting. So if you’re reading a document which is formatted as two columns per page, the OCR text will be a mess.

    • anon@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      I’m willing to bet that given that most scientific papers are in that two format column, this ocr will take that into account or it’s dead on arrival.