AI-Sciantist: An Autonomous Research Loop That Never Sleeps
Research is iterative. You have an idea, implement it, train it, evaluate it, decide whether to keep it, and repeat. Most of that loop is mechanical — so I automated it.
AI-Sciantist is a self-directed, closed-loop research system that runs the full experimental cycle without human intervention. Because not everyone can work 24/7.
The loop¶
1. IDEATE → scan codebase, generate focused hypotheses
2. IMPLEMENT → AI coding agent (Aider/OpenCode) writes the code
3. TRAIN → submit SLURM/LSF job to HPC cluster
4. EVALUATE → pull W&B metrics, compute unified score
5. DECIDE → keep if better, revert if worse
6. ITERATE → update memory, loop back to step 1
Each candidate experiment runs in an isolated git worktree on its own feature branch. If the metric improves, the commit is merged to the base branch. If not, it's reverted.
Multi-expert ideation¶
The key insight: a single LLM prompt produces generic ideas. But specialized expert personas produce focused, high-quality hypotheses. Sciantist deploys 8 experts:
| Expert | Focus |
|---|---|
| Hyperparameter | LR, warmup, scheduler, weight decay, dropout, batch size, EMA |
| Architecture | Backbone swaps, depth/width, normalization, attention, adapters |
| Loss | Weighted losses, auxiliary losses, focal variants, label smoothing |
| Data | Preprocessing, augmentation, loading strategies, generalization |
| GPU Utilization | Throughput, mixed precision, compilation, memory |
| Optimizer | AdamW/SGD/Lion alternatives, beta tuning, decoupled weight decay |
| High-Risk/High-Reward | Novel, experimental ideas that challenge assumptions |
| Web-Research | Latest papers, preprints, conference proceedings, trends |
The unified metric¶
Different experiments optimize different metrics. Sciantist uses a configurable weighted metric composition to compare them fairly:
where \(w_i\) is the weight and \(m_i\) is the value of metric \(i\). For example:
metric_weights:
val/accuracy: 0.6
val/f1_score: 0.4
metric_higher_is_better: true
Human-in-the-loop¶
The system supports live, real-time steering without restarts. You write
instructions to user_prompt.md, and the system reads them at the very next
ideation step:
# Start the loop
uv run scian
# In another terminal, steer it
echo "Focus on learning rate scheduling — try cosine annealing with warm restarts." \
> config/user_prompt.md
# The next candidate will prioritize your instruction automatically
This is injected as high-priority context, overriding default ideation tendencies when there's a conflict.
Cluster integration¶
Sciantist supports both SLURM and LSF schedulers via SSH to remote HPC systems. You define cluster profiles in YAML:
hpc-cluster:
cluster_target: hpc-cluster
ssh_target: user@hpc.example.edu
scheduler: slurm
submit_extra_args: "--gres=gpu:4"
Jobs are submitted, polled at configurable intervals, and their W&B run snapshots are automatically extracted (with retry logic for transient API failures).
Why this matters¶
Research throughput is often limited by human bottlenecks — not creativity, but the mechanical cycle of implement → train → evaluate → decide. By automating that cycle, Sciantist lets you focus on the high-level direction while the system explores the solution space in parallel.
It's not replacing researchers — it's giving them a tireless assistant that runs experiments 24/7, on a cluster, with 8 different perspectives, and remembers everything.
::: note The code is open source under the Apache 2.0 license. It requires Python 3.12+, an OpenAI-compatible LLM API, and optionally SSH access to a SLURM/LSF cluster. :::