Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’::Experts are starting to doubt it, and even OpenAI CEO Sam Altman is a bit stumped.

  • Zeshade@lemmy.world
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    3
    ·
    1 year ago

    In my limited experience the issue is often that the “chatbot” doesn’t even check what it says now against what it said a few paragraphs above. It contradicts itself in very obvious ways. Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously? Or a check to ensure recipes are edible (for this specific application)? A bit like those physics informed NN.

    • Zeth0s@lemmy.world
      link
      fedilink
      English
      arrow-up
      43
      arrow-down
      1
      ·
      edit-2
      1 year ago

      That’s called context. For chatgpt it is a bit less than 4k words. Using api it goes up to a bit less of 32k. Alternative models goes up to a bit less than 64k.

      Model wouldn’t know anything you said before that

      That is one of the biggest limitations of current generation of LLMs.

      • Womble@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Thats not 100% true. they also work by modifying meanings of words based on context and then those modified meanings propagate indefinitely forwards. But yes, direct context is limited so things outside it arent directly used.

        • Zeth0s@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

          • Womble@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            No they do, thats one of the key innovations of LLMs the feed forward step where they propagate information from related words into each other based on context. from https://www.understandingai.org/p/large-language-models-explained-with?r=cfv1p

            For example, in the previous section we showed a hypothetical transformer figuring out that in the partial sentence “John wants his bank to cash the,” his refers to John. Here’s what that might look like under the hood. The query vector for his might effectively say “I’m seeking: a noun describing a male person.” The key vector for John might effectively say “I am: a noun describing a male person.” The network would detect that these two vectors match and move information about the vector for John into the vector for his.

    • cryball@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      edit-2
      1 year ago

      Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

      Maybe, but it might not be that simple. The issue is that one would have to design that logic in a manner that can be verified by a human. At that point the logic would be quite specific to a single task and not generally useful at all. At that point the benefit of the AI is almost nil.

      • postmateDumbass@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        And if there were an algorithm that was better at determining what was or was not the goal, why is that algorithm not used in the first place?

    • doggle@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      They do keep context to a point, but they can’t hold everything in their memory, otherwise the longer a conversation went on the slower and more performance intensive doing that logic check would become. Server CPUs are not cheap, and ai models are already performance intensive.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

      You, in your “limited experience” pretty much exactly described the fix.

      The problem is that most of the applications right now of LLMs are low hanging fruit because it’s so new.

      And those low hanging fruit examples are generally adverse to 2-10x the query cost in both time and speed just to fix things like jailbreaking or hallucinations, which is what multiple passes, especially with additional context lookups, would require.

      But you very likely will see in the next 18 months multiple companies being thrown at exactly these kinds of scenarios with a focus for more business critical LLM integrations.

      To put it in perspective, this is like people looking at AIM messenger back in the day and saying that the New York Times has nothing to worry about regarding the growth of social media.

      We’re still very much in the infancy of this technology in real world application, and because of that infancy, a lot of the issues present that aren’t fixable inherent to the core product don’t yet have mature secondary markets around fixing those shortcomings yet.

      So far, yours was actually the most informed comment in this thread I’ve seen - well done!

      • Zeshade@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Thanks! And thanks for your insights. Yes I meant that my experience using LLM is limited to just asking bing chat questions about everyday problems like I would with a friend that “knows everything”. But I never looked at the science of formulating “perfect prompts” like I sometimes hear about. I do have some experience in AI/ML development in general.

    • Eezyville@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Contradicting itself? Not staying consistent? Looks like it’s passed the Turing test to me. Seems very human.