• sp3ctr4l@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      5 days ago

      I’ve been toying with Qwen3.

      On my steam deck.

      8 bil param model runs stably.

      Its’s opensource too!

      Alpaca is a neat little flatpak that containerizes everything and makes running local models so easy that I can literally do it without a mouse or keyboard.

      • JustAnotherKay@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        21 hours ago

        Oh my god I feel so stupid. I’ve been arguing back and forth whether it was worth de-atomizing my steam deck to spin up alpaca in docker. I forgot they have a flatpak

        • sp3ctr4l@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          18 hours ago

          Bazzite also has podman, though not specifically docker, in the core OS.

          So… I have spun up one local LLM in Alpaca, told it what hardware, OS, and environment it is in/on, told it to generate a context prompt to inform itself of all that… and its now helping me try to figure out how/if it is possible to set up a podman container/environment… for LLMs that either Alpaca does not yet support, or I am too stupid to figure out.

          Alpaca even has tools. You can give an LLM the ability to search the web for something, and find some info or what not.

          ROCm on a Deck seems to kind of sort of work via … basically you spoof your gpu id in the podman environment, and then… you would either hwve to do the ole allocate more ram to gpu thing, or attempt to edit the LLM’s config and such, to try an run in a much lower than expected vram situation.

          (WIP)

          Presumably you could tell it to do a lot of things but that seems like a bad idea lol, anyway yeah, I was able to just tell it ‘go online and lookup bazzite, familiarize yourself with pertinent details, reformulate context prompt.’

      • sp3ctr4l@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 days ago

        Unless I am missing something:

        Most people do not have a local LLM in their pocket right now.

        Most people have a client app that talks to a remote LLM, which ‘lives’ in an ecologically and economically dubious mega-datacenter, in their pocket right now.

        • GamingChairModel@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          4 days ago

          Plenty of the AI functions on phones are on-device. I know the iPhone is capable of several text-based processing (summarizing, translating) offline, and they have an API for third party developers to use on-device models. And the Pixels have Gemini Nano on-device for certain offline functions.

  • SSUPII@sopuli.xyz
    link
    fedilink
    arrow-up
    20
    ·
    edit-2
    5 days ago

    If we assume a CD, you can probably fit a 256M parameters model in it. But it will LOAD.

      • khepri@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        4 days ago

        That’s so crazy that an LLM capable of doing anything at all can be that small! That’s leaves room for like an entire .avi episode of family guy at dvd resolution on there, which is the natural choice for the remaining space of course

        • tetris11@feddit.uk
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 days ago

          a 4k episode of family guy using H265 (HEVC) and assuming not too many cutaway gags could produce a file about 240MB. You could probably fit a 480i episode of south park in the remaining 60MB

      • khepri@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        4 days ago

        Wow, just popped it onto my very slow desktop and this little model rips haha. I really think tiny LLMs with a good LoRA on top are going to be a huge deal going forward

    • lime!@feddit.nu
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      4 days ago

      there’s also tinyllama, which is somewhere around 600MB. it’s hilariously inept. it’s like someone jpeg-compressed a robot.

      also you’re only gonna load off of that cd once so it’ll perform fine.

    • nomorebillboards@lemmy.world
      link
      fedilink
      arrow-up
      10
      ·
      4 days ago

      I thought it was more like 10-20GB to start out with a usable (but somewhat stupid) model.

      Are you confusing the size of the dataset with the size of the model?

    • yellow [she/her]@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      4 days ago

      It’s not the LLM that does the web searching, but the software stack around it. On its own, an LLM is just a text completer. What you’d need a frontend like OpenWebUI or Perplexica that would ask the LLM for, say five internet search queries that could return useful information for the prompt, throw those queries into SearxNG, and then pipe the results into the LLM’s context for it to be used.

      As for the models themselves, any decently-sized one that was released fairly recently would work. If you’re looking specifically for open-source rather than open-weight models (meaning that the training data and methodologies were also released rather than just the model weights), GPT-OSS 20B/120B and the OLMo models are recent standouts there. If not, the Qwen3 series are pretty good. (There are other good models out there, this is just what I remember off the top of my head.)