I’ve just re-discovered ollama and it’s come on a long way and has reduced the very difficult task of locally hosting your own LLM (and getting it running on a GPU) to simply installing a deb! It also works for Windows and Mac, so can help everyone.

I’d like to see Lemmy become useful for specific technical sub branches instead of trying to find the best existing community which can be subjective making information difficult to find, so I created [email protected] for everyone to discuss, ask questions, and help each other out with ollama!

So, please, join, subscribe and feel free to post, ask questions, post tips / projects, and help out where you can!

Thanks!

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      17 hours ago

      Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

      • WhirlpoolBrewer@lemmings.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google’s Gemini as my go-to LLM. I think I’d like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it’s on my laptop, not my phone.

        I’m not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I’m not big into the direction Firefox is headed, I’ve heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

        I’m looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I’d give you as much info as I can.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          15 hours ago

          OK.

          Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

          That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

          Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

          Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

          What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

          This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.