This is an idea I’ve been toying with for a bit. There is a ton of media that includes unimportant information that doesn’t need to be stored pixel perfect. Storing large portions of the image data as text will save substantial amounts of storage, and as the reality of on-device image generation becoming commonplace sets in digital memories will become the main way people capture the world around them. I think this will inevitably be the next form of media capture (photography and video), not replacing other methods/ formats, but I could see things like phone cameras having saving images as digital memories set to default to save on storage.
I love this and have had similar thoughts in relation to my non verbal kid wanting to keep memories in a way they can point out different parts and link together multiple things to make new stories or comments or hypotheticals. Important to have the context and the parts and named things all relating together. I don’t know much about it, but there is a thing called “sidecar” file that can be associated with media. There are some moves to make EXIF data more standardized. So there’s a chance this could be done in an open format.
OP sounds like he’s making a data compression pitch, but I think you have the better idea. I think surrounding the picture with a lot of contextual data about when/why/how this picture was taken will absolutely help recall and connecting to related concepts.