• 1 Post
  • 15 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • I don’t understand.

    Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized. It has a docker preset you can use, yeah.

    And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, Lemonade, vllm and sglang do. It’s basically standard. There’s all sorts of wrappers around them too.

    You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: https://pytorch.org/blog/compromised-nightly-dependency/

    This is actually a huge advantage for llama.cpp, as it’s free of python and external dependencies by design. This is very unlike ComfyUI which pulls in a gazillian external repos. Theoretically the main llama.cpp git could be compromised, but it’s a single, very well monitored point of failure there, and literally every “outside” architecture and feature is implemented from scratch, making it harder to sneak stuff in.


  • OK.

    Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

    That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

    Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

    Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

    What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

    This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.




  • TBH you should fold this into localllama? Or open source AI?

    I have very mixed (mostly bad) feelings on ollama. In a nutshell, they’re kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that’s just the tip of the iceberg, they’ve made lots of controversial moves, and it seems like they’re headed for commercial enshittification.

    They’re… slimy.

    They like to pretend they’re the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

    It’s also a highly suboptimal way for most people to run LLMs, especially if you’re willing to tweak.

    I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.


    …TL;DR I don’t the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.






  • They are GPUs.

    All of them, even the H100, B100, and MI300X all have texture units, pixel shaders, everything. They are graphics cards at a low level. Only the MI300X is missing ROPs, but the Nvidia cards have them (and can run realtime games on Linux), and they all can be used in Blender and such.

    The compute programming languages they use are, fundamentally, hacked up abstractions to map to the same GPU hardware in consumer stuff.

    That’s the whole point, they’re architected as GPUs so that they’re backwards compatible, as everything’s built on the days when consumer gaming GPUs were hacked to be used for compute.


    Are there more dedicated accelerators? Yes. They’re called ASICs, or application specific integrated circuits. This is technically a broad term, but mostly its connotation is very purpose made compute.


  • Paraphrased by Wikipedia: https://en.wikipedia.org/wiki/2025_India–Pakistan_conflict#Analysis

    [The Times] reported that India felt frustrated after Donald Trump public claims of mediating a cease-fire, presenting both countries as equals and downplaying the terrorist attack that triggered the conflict, and that India had hoped any U.S. involvement would remain discreet, and Trump’s portrayal of both countries on equal terms was seen by Indian officials as politically sensitive and diplomatically frustrating…

    On 21 June, Pakistan announced it would nominate Donald Trump for the Nobel Peace Prize, citing his role in brokering the ceasefire. Pakistan credited Trump’s diplomatic intervention, though India denied any U.S. mediation.

    Like it said, seems like India assumed the modest level of mediation would be confidential (clear miscalculus on their part), while Pakistan, err, trumped up the magnitude of the intervention to paint themselves in a better light, possibly because they’re at a military disadvantage, and felt grateful for the help.

    Seems like there was some backchannel involvement from many countries (like “Saudi Arabia, Iran, the UAE and the UK” and indeed the US), but Trump couldn’t help himself and loudly claimed credit before the ceasefire was even announced.

    Now India’s annoyed (hence their flat denial).

    I like this explanation, it ‘fits’ all the involved characters, including Trump blotting out the sun and killing any nuance to the situation.




  • On the two subs I frequented:

    • /r/thelastairbender is just cultish and shallow now. I abandoned it. But it’s painful for me, as this is like the only sane place left the fandom has any critical mass. /c/thelastairbender is nice, but very quiet.

    • /r/localllama Has… lost its intelligence? Like no one seems to experiment or talk technically anymore, good talk seems to be on github, or shattered across Discords, while the ‘critical mass’ is in the AI Bro black hole of Twitter and Linkedin. I read it, but never post anymore. localllama here is better, but smaller and downvoted to hell.

    Also, I’ve been shadowbanned on like 4 accounts in 3 different IPs/machines, no explanation, no recourse. I never post anything political or even remotely provocative (unless links to Lemmy count) and only visit those two subs, so… Yeah, kinda sick of that.