Support OCR to extract text from images for alt text
What feature would you like implemented?
Allow for using OCR to generate alt text for images by extracting any text contained in the image.
Why should we add this feature?
Transcribing alt text from images can be time consuming and tedious. For images which are primarily composed of textual content (eg. screenshots of chat platforms, screenshots of blogs, etc.), it is useful to be able to use OCR to extract the text from it.
Of course, no OCR system by any means is perfect, and this is not intended to be used to completely replace users writing their own alt text for images, but rather to supplement it. For example, a user might upload an image which contains lots of text, run OCR on it to generate an initial alt text, and then modify the generated text to fix any errors or add any additional information they feel is important.
An example from mastodon (tech.lgbt):
Using something such as a multi-modal LLM to generate alt text which describes the contents of an image is not within the scope of this proposal (although such a thing may be useful, it should be a separate proposal), rather only extracting the textual elements from an image through OCR, which has been proven to be substantially more reliable than an LLM.
This would only be used when a user is composing a post and only if the user requests it (eg. through clicking a button/text which triggers it, such as in the above image). It would not be used for adding alt text to posts which do not have alt text (from the current instance or otherwise). That is out of scope.
Version
Currently not using sharkey (possibly considering it), however afaik this feature does not exist in sharkey.
Instance
Same as above.
Contribution Guidelines
By submitting this issue, you agree to follow our Contribution Guidelines
- I agree to follow this project's Contribution Guidelines
- I have searched the issue tracker for similar requests, and this is not a duplicate.