@Gaywallet I’m coming to think that expecting models to produce human-like values and underlying representations is a mistake, and we should recognize them as cognition tools which are entirely possible to misuse.
Why? LLMs get worse at tasks as you attempt to train them with RLHF - and those with the base models will use them without filtering for a significant intelligence-at-scale advantage. They’ll give the masses the moralized, literally dumber version.
@Gaywallet I’m coming to think that expecting models to produce human-like values and underlying representations is a mistake, and we should recognize them as cognition tools which are entirely possible to misuse.
Why? LLMs get worse at tasks as you attempt to train them with RLHF - and those with the base models will use them without filtering for a significant intelligence-at-scale advantage. They’ll give the masses the moralized, literally dumber version.