Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn’t know the answer, it would have been trustworthy.
Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn’t know the answer, it would have been trustworthy.
I did Google that fwiw and the answer I got was that sparse autoencoders work so that it checks the output aligns with the input
If it’s unknowable if the input is correct, won’t it still be subject to outputting confidently incorrect information