What is PlatPhorm News?

PlatPhorm News is the public root and demilitarized read-only entry point for the PlatPhorm network. It aggregates public source-attributed stories and exposes the root graph, discovery files, feeds, and trusted-domain directory.

What is the PlatPhorm network?

The PlatPhorm network is the set of trusted and pending *.platphormnews.com services discovered through the root graph, static registry, and sitemap surfaces.

What is MCP (Model Context Protocol)?

MCP is used where it matches a site purpose. The root exposes a local MCP endpoint for public story and network discovery, with protected actions gated by PLATPHORM_API_KEY.

How do I access the PlatPhorm News API?

The API is available at /api/docs and /openapi.yaml with endpoints for stories, feeds, network graph, route compliance, trusted domains, and MCP discovery.

What requires authentication?

Public read-only discovery is open. Mutating, sync, test-triggering, reporting, publishing, and administrative actions require PLATPHORM_API_KEY.

How to setup a local coding agent on macOS

Hacker News by kkm 507 votes 2198 karma 14d ago

comments (10)

> The benchmark prompt was:
> Write a compact Python function that parses a unified diff and returns the changed file paths. Then explain two edge cases.
> Each benchmark generated about 128 tokens.
Generating 128 tokens is probably not enough for good benchmark results. MTP speedup depends on how often the predicted tokens are accepted. In my experience, the very early output has a higher acceptance rate, so short testing can give false positive speedups.
llama.cpp includes a tool specifically for benchmarking that will sweep the arguments for you so you don't have to restart the server and send it prompts:
https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...
EDIT: Also the section about downloading the models should have mentioned that llama.cpp has a "-hf" argument that will download the models for you. I appreciate the author for sharing their experience, but for beginners this might not be the best guide to use.

Aurornis 14d ago
I wrote a similar post some time ago just used ollama and opencode https://blog.kulman.sk/running-local-llm-coding-server/

ig0r0 14d ago
Not sure you really need huggingface-cli to download anything if you're just using llama.cpp. You can pass `-hf ...` and it will download the models for you. Set `LLAMA_CACHE` to change where the downloads go:
```
  LLAMA_CACHE="models" ./llama-server \
    -hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
    ...
```
c-hendricks 14d ago
I've been quite impressed with DeepSeek v4 Flash running via antirez's ds4[0].
It feels like a GPT-4 class model in terms of "stored knowledge" but is better at long-horizon tool calling than any of the GPT-4 class models.
Running on a 128GB MBP M4 Max, I'm getting ~24 t/s on generation and ~200 t/s on prefill. I was expecting it to feel slow, and it certainly does when e.g. generating code, but it's surprisingly useful as a "machine orchestrator" for simple tasks.
For non-agentic usecases, it's a decent enough model to converse with, and has the benefit of being entirely self-contained/private.
[0]https://github.com/antirez/ds4

jumploops 14d ago
I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI
You would not need to follow a blog post with omlx IMHO

vladgur 14d ago
FYI you can open Claude code in the terminal, point it at this article and just tell it to "do it", if you're feeling extra lazy

jmkni 14d ago
Useful stuff in here that I wish I'd seen a few days ago :-)
I am not convinced that the MTP setup for the QAT model adds very much in terms of speed on my M1 Max, but it is definitely worth experimenting with.
Fiddling about with local models has done so much for my conceptual understanding of what is going on.
FWIW and YMMV but I also found the Gemma 4 MTP head was occasionally breaking markup in Opencode, causing the thinking to display untidily and ultimately in some cases missing the stop token. So I've stopped using MTP there for now.
Recent Qwen 3.6 models have developer role support so it will occasionally surprise you with a structured multiple choice questionnaire.

dofm 14d ago
For high Ram (unified), and relatively middling to lowish Tflops and bandwidth GB/s, usually MoEs are most hopeful. The current top-1 in the (iq, tok/s, @ context depth) ranks for me (M2 Max, 96gb) is DeepSeek-V4-Flash REAP25 <65gb gguf + ds4-server + pi agent. Not better than cloud API ofc, but useful enough to endure if I need to. E.g on a non-Internet 4h flight the battery (local llm draws 60w) held long enough. REAP supporting ds4 branch here
https://github.com/ljubomirj/ds4/tree/reap-compact-support
DS4F dropping to unusable <10 tok/s only at 784K context (!!) makes a big difference.

ljosifov 14d ago
>64 GB
Thats the rub. I have an M4 with 48G. I wonder if it is worth testing this out.
My past attempts (with Ollama and various LLMs) were too slow to use.

reddit_clone 14d ago
I poured a couple days into custom Burn inference for Qwen3-Coder-Next only to find it doesn't come with a speculative decoder, so on my M4 Max I can't push it much further than 120t/s. That's still kinda slow, though still faster than llama.cpp's 70.9t/s and MLX's 80.6t/s with the same model. Claude Fable 5 is recommending I use the Qwen3 MTP -- I worry that will compromise the quality somewhat, but might give it a try to see if I can get more usable speeds.

LoganDark 14d ago