Goose CLI Agent and Dedicated Container Inference Deploy Netflix's void-model from Hugging Face in One Session

News

5/9/2026, 1:14:28 AM

Goose CLI Agent and Dedicated Container Inference Deploy Netflix's void-model from Hugging Face in One Session

Blaine Kasten published a practical walkthrough on 2026 — 05-08 demonstrating that a Goose CLI agent combined with a Dedicated Container Inference skill can turn a Hugging Face model (netflix/void-model) into a runnable container and inference service in a single session, cutting days of manual containerization and server configuration. This matters because builders can move from model release to hands‑on testing far faster, shortening the feedback loop for BYOC deployments.

Kasten documents a three — step setup that developers can replicate. First, add the dedicated containers skill with the command: npx skills add togethercomputer/skills. Second, start a Goose session and issue a single prompt — for example: I want to deploy this model on togethers dedicated containers https://huggingface.co/netflix/void-model. Third, let the agent generate the container and server configuration and produce a runnable repository.

Under the hood, the agent pulls model metadata from Hugging Face, selects an appropriate inference‑server configuration for the model architecture, generates container spec files and wiring, and emits a ready repository. The walkthrough cites an example output repository named blainekasten/together — void-model-container on GitHub. The dedicated containers skill supplies the domain knowledge the agent needs to configure Dedicated Container Inference infrastructure for BYOC deployments.

Kasten includes concrete commands for validating the deployment and running inference. He shows the tg beta jig submit command with a JSON payload to send a video and mask, for example: tg beta jig submit --watch --payload '{"video_url": "https://github.com/Netflix/void-model/raw/refs/heads/main/sample/lime/input_video.mp4", "quadmask_url": "https://github.com/Netflix/void-model/raw/refs/heads/main/sample/lime/quadmask_0.mp4", "prompt": "Empty park bench with fallen leaves on the ground", "use_pass2": false}'.

The post notes the void-model removes objects from video along with induced physical interactions, and that inference calls are asynchronous and return a request identifier for polling. To illustrate job lifecycle handling, Kasten shows an example asynchronous response the deployment returns, containing fields such as: {"model": "void-byoc", "request_id": "019dc0f3 — 3c73-7a3f-b4b6-87ad06091180", "status": "running", "claimed_at": "2026 — 04-24T19:24:19.447457Z", "created_at": "2026 — 04-24T19:24:19.444567Z", "done_at": null, "info": null, "inputs": {...}}. That explicit example helps builders implement polling and client — side job management.

Kasten frames the demo as part of a broader shift in which agents bridge the expertise gap around containerization, server configuration, and model‑specific environment setup, reducing the lag between a model’s release and hands‑on experimentation. For practitioners this approach promises faster, reproducible BYOC deployments into production‑grade GPU environments and a clearer path to test new models on release day.

Sources

Together AI Blog · 5/8/2026

Replies (0)

No replies in this topic yet.

Back