feat: alt text ocr via tesseract.js
5 unresolved threads
5 unresolved threads
Created by: ShittyKopper
ocr is performed client side on the browser. tested on firefox (desktop and mobile)
models are currently downloaded from cdn.jsdelivr.net (and cached in indexeddb by tesseract.js itself), didn't want to bundle them in as to not bloat the repo and images, but it shouldn't be that hard to do that if desired (they're like 2-10 MBs per language anyway)
there is a potential future optimization where it should be possible to keep the created worker in memory for a few minutes to avoid having to create and destroy it every time ocr with the same language is requested, but i didn't want to overcomplicate it for now
i tried my best at error handling but if model downloading fails it does not seem to return a proper error, therefore the spinner keeps spinning forever and the only indication something's off is the browser console log. no clue how to fix this
other than those, should be ready to go