• fluxion@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 days ago

    Yes, weights for individual words/phrases/token which, given a particular prompt/keyword, which might reproduce the original training data almost in it’s entirety given similar set of prompt or set of keywords. Hence why it is so obvious when these models have been trained on copyrighted material.

    Similarly, I don’t digitally store music in my head verbatim, I store some fuzzy version that i can still reproduce fairly closely when prompted, and still get sued if I’m charging money for performing or recording it, because the “weightings” in my neurons are just an implementation detail of how my brain works and not some active/purposeful attempt to transform the music in any appreciable way.

    • Zetta@mander.xyz
      link
      fedilink
      arrow-up
      1
      ·
      2 days ago

      given a particular prompt/keyword, which might reproduce the original training data almost in it’s entirety given similar set of prompt or set of keywords.

      What you describe here is called memorization and is generally considered a flaw/bug and not a feature, this happens with low quality training data or not enough data. As far as I understand this isn’t a problem on frointer llms with the large datasets they’ve been trained on.

      Eitherway, just like a photocopier an llm can be used to infringe copyright if that’s what someone is trying to do with it, the tool itself does not infringe anything.