I've just created c/Ollama!

catty@lemmy.world · edit-2 1 day ago

I've just created c/Ollama!

WhirlpoolBrewer@lemmings.world · 8 hours ago

I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google’s Gemini as my go-to LLM. I think I’d like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it’s on my laptop, not my phone.

I’m not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I’m not big into the direction Firefox is headed, I’ve heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

I’m looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I’d give you as much info as I can.

brucethemoose@lemmy.world · edit-2 8 hours ago

Honestly perplexity, the online service, is pretty good.

As for local running, one question first: how much RAM does your Mac have? This is basically the factor for what model you can and should run.

WhirlpoolBrewer@lemmings.world · 8 hours ago

brucethemoose@lemmy.world · edit-2 7 hours ago

8GB?

You might be able to run Qwen3 4B: https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ/tree/main

But honestly you don’t have enough RAM to spare, and even a small model might bog things down. I’d run Open Web UI or LM Studio with a free LLM API, like Gemini Flash, or pay a few bucks for something off openrouter. Or maybe Cerebras API.

…Unfortunely, LLMs are very RAM intensive, and >4GB (more realistically like 2GB) is not going to be a good experience :(

WhirlpoolBrewer@lemmings.world · 7 hours ago

Good to know. I’d hate to buy a new machine strictly for running an LLM. Could be an excuse to pickup something like a Framework 16, but realistically, I don’t see myself doing that. I think you might be right about using something like Open Web UI or LM Studio.

brucethemoose@lemmy.world · edit-2 5 hours ago

Yeah, just paying for LLM APIs is dirt cheap, and they (supposedly) don’t scrape data. Again I’d recommend Openrouter and Cerebras! And you get your pick of models to try from them.

Even a framework 16 is not good for LLMs TBH. The Framework desktop is (as it uses a special AMD chip), but it’s very expensive. Honestly the whole hardware market is so screwed up, hence most ‘local LLM enthusiasts’ buy a used RTX 3090 and stick them in desktops or servers, as no one wants to produce something affordable apparently :/

~> psudojo@witsEnd <~@ioc.exchange · 5 hours ago

@brucethemoose @WhirlpoolBrewer

*1650 and it works like a charm 🤌🏾

brucethemoose@lemmy.world · edit-2 3 hours ago

1650

You mean GPU? Yeah, it’s good, I was strictly talking about purchasing a laptop for LLM usage, as most are less than ideal for the money. Laptop vram pools are relatively small and SO-DIMMS are usually very slow.

Things will get much better once the “Max” AMD SKUs proliferate.

brucethemoose@lemmy.world · edit-2 8 hours ago

Actually, to go ahead and answer, the “fastest” path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

Hopefully one of these models, depending on how much RAM you have:

https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

And then use LM Studio (or some other MLX backend, or even free online API models) as the ‘engine’

Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I’d have to look around.

WhirlpoolBrewer@lemmings.world · 8 hours ago

This is all new to me, so I’ll have to do a bit of homework on this. Thanks for the detailed and linked reply!

brucethemoose@lemmy.world · edit-2 7 hours ago

I was a bit mistaken, these are the models you should consider:

https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ

https://huggingface.co/AnteriorAI/gemma-3-4b-it-qat-q4_0-gguf

https://huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)

they are state-of-the-art at this size, as far as I know.

WhirlpoolBrewer@lemmings.world · 7 hours ago

Awesome, I’ll give these a spin and see how it goes. Much appreciated!