Introducing Hito 2B: Structured Reasoning in a Small Model

Today we are releasing Hito 2B, a 2-billion-parameter language model that brings structured reasoning to the small model space. Fine-tuned from Qwen3.5-2B using our proprietary training methodology, Hito 2B represents a significant step forward in what compact models can achieve.

Hito 2B

Structured Nested Reasoning in a 2-Billion-Parameter Model

Download on Hugging Face GGUF Quantizations

The Problem with Small Model Reasoning

Small language models have traditionally struggled with complex reasoning tasks. They can generate fluent text, but when faced with multi-step problems, they often lose track of their own logic, make inconsistent claims, or fail to self-correct obvious errors.

We asked ourselves: what if the model's reasoning process was visible and structured, rather than hidden in an opaque chain-of-thought? What if we could teach a model to think in stages?

Introducing the Cognitive Framework

Hito 2B uses a novel Cognitive Framework that organizes thinking into explicit, nested tags within a <think>...</think> envelope. These are not just decorative labels. They constrain the model's policy distribution, forcing it to allocate generation steps to each cognitive stage sequentially.

The framework includes five cognitive stages:

Comprehension: Understanding the problem (<understand>, <curious>, <connect>)
Retrieval: Accessing relevant knowledge (<recall>, <compare>, <simulate>)
Deliberation: Working through the logic (<logic>, <plan>, <anticipate>, <imagine>)
Verification: Checking the work (<doubt>, <verify>, <careful>)
Metacognition: Reflecting on the process (<reflect>, <honest>, <limits>, <emotion>)

This structured approach enables something powerful: first-class self-correction within a single response. The sequence <doubt> followed by <verify> followed by an updated <commit> allows the model to catch and fix its own mistakes in real-time, observable in the output rather than hidden across multiple turns.

Benchmark Performance

The results speak for themselves. In head-to-head comparisons with the Qwen3.5-2B base model under matched conditions:

Benchmark	Category	Hito 2B	Base	Delta
GSM8K	Math word problems	60%	25%	+35
MATH-500	Competition math	15%	5%	+10
ARC-Challenge	Scientific reasoning	75%	65%	+10
HumanEval-style	Code synthesis	95%	90%	+5
Macro average	Reasoning	61.3%	46.3%	+15.0

The +35 point improvement on GSM8K is particularly notable. This benchmark has been a persistent challenge for small models, and Hito 2B's structured reasoning approach makes a dramatic difference.

Efficiency That Matters

Perhaps surprisingly, structured reasoning also improves efficiency. By constraining the model to follow a defined cognitive path, we prevent the "unproductive expansion loops" that plague many reasoning models.

Median thinking length: ~25% shorter than base model
Typical response time: Under 10 seconds on hard problems (vs. 33 seconds for base)
No quality sacrifice: Shorter responses with better answers

What Hito 2B Can Do

We have validated Hito 2B across diverse reasoning challenges:

Abstract Reasoning: Solves ARC-AGI grid puzzles (fluid intelligence tests)
Symbolic Mathematics: Derives competition-level algebra solutions
Statistical Reasoning: Identifies confounding variables and correlation-causation gaps
Bayesian Reasoning: Correctly computes posterior probabilities, overcoming base-rate neglect
Deductive Logic: Solves Knights-and-Knaves puzzles via systematic case analysis
Self-Referential Reasoning: Engages metacognitively on its own nature without false consciousness claims

How to Use Hito 2B

Hito 2B is available today through multiple channels:

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "hitonet/hito-2b", torch_dtype="auto", device_map="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("hitonet/hito-2b", trust_remote_code=True)

messages = [{"role": "user", "content": "If x + 1/x = 3, what is x^3 + 1/x^3?"}]
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=True
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=4000, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Ollama (GGUF Quantizations)

# Recommended (1.4 GB)
ollama run hf.co/hitonet/hito-2b-GGUF:Q5_K_M

# Smaller footprint (1.2 GB)
ollama run hf.co/hitonet/hito-2b-GGUF:Q4_K_M

# Lossless (3.6 GB)
ollama run hf.co/hitonet/hito-2b-GGUF:F16

Hosted API

curl https://api.hitonet.com/v1/chat/completions \
  -H "Authorization: Bearer $HITONET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "hito-2b", "messages": [{"role": "user", "content": "Hello"}]}'

New users get $1 in free API credits at platform.hitonet.com.

Training Methodology

Hito 2B was trained using a two-stage proprietary pipeline:

Stage 1: Progressive LoRA Merging (PLM)

Multiple rounds of LoRA fine-tuning on curated structured-reasoning data, with each round's adapter merged into the base before the next. This internalizes the Cognitive Framework grammar while retaining base capabilities.

Stage 2: Group Relative Policy Optimization (GRPO)

A custom reward formula with explicit reasoning-answer consistency signals, trained on our proprietary reasoning dataset. This reinforces behaviors that produce measurable capability gains.

Licensing

Hito 2B is released under the Hitonet Community License:

Personal/hobby use: Yes, with attribution
Academic research: Yes, with attribution and citation
Non-commercial open-source: Yes, with attribution
Commercial use: Requires written permission (contact [email protected])

Get Started

Download Hito 2B today:

Hugging Face: huggingface.co/hitonet/hito-2b
GGUF Quantizations: huggingface.co/hitonet/hito-2b-GGUF
API Access: platform.hitonet.com
Chat Interface: chat.hitonet.com

We cannot wait to see what you build with Hito 2B.