AI-Sciantist: An Autonomous Research Loop That Never Sleeps

Research is iterative. You have an idea, implement it, train it, evaluate it, decide whether to keep it, and repeat. Most of that loop is mechanical — so I automated it.

AI-Sciantist is a self-directed, closed-loop research system that runs the full experimental cycle without human intervention. Because not everyone can work 24/7.

The loop¶

1. IDEATE       → scan codebase, generate focused hypotheses
2. IMPLEMENT    → AI coding agent (Aider/OpenCode) writes the code
3. TRAIN        → submit SLURM/LSF job to HPC cluster
4. EVALUATE     → pull W&B metrics, compute unified score
5. DECIDE       → keep if better, revert if worse
6. ITERATE      → update memory, loop back to step 1

Each candidate experiment runs in an isolated git worktree on its own feature branch. If the metric improves, the commit is merged to the base branch. If not, it's reverted.

Multi-expert ideation¶

The key insight: a single LLM prompt produces generic ideas. But specialized expert personas produce focused, high-quality hypotheses. Sciantist deploys 8 experts:

Expert	Focus
Hyperparameter	LR, warmup, scheduler, weight decay, dropout, batch size, EMA
Architecture	Backbone swaps, depth/width, normalization, attention, adapters
Loss	Weighted losses, auxiliary losses, focal variants, label smoothing
Data	Preprocessing, augmentation, loading strategies, generalization
GPU Utilization	Throughput, mixed precision, compilation, memory
Optimizer	AdamW/SGD/Lion alternatives, beta tuning, decoupled weight decay
High-Risk/High-Reward	Novel, experimental ideas that challenge assumptions
Web-Research	Latest papers, preprints, conference proceedings, trends

The unified metric¶

Different experiments optimize different metrics. Sciantist uses a configurable weighted metric composition to compare them fairly:

\[ M_{\text{unified}} = \frac{\sum_{i} w_i \cdot m_i}{\sum_{i} w_i} \]

where \(w_i\) is the weight and \(m_i\) is the value of metric \(i\). For example:

metric_weights:
  val/accuracy: 0.6
  val/f1_score: 0.4
metric_higher_is_better: true

Human-in-the-loop¶

The system supports live, real-time steering without restarts. You write instructions to user_prompt.md, and the system reads them at the very next ideation step:

# Start the loop
uv run scian

# In another terminal, steer it
echo "Focus on learning rate scheduling — try cosine annealing with warm restarts." \
  > config/user_prompt.md

# The next candidate will prioritize your instruction automatically

This is injected as high-priority context, overriding default ideation tendencies when there's a conflict.

Cluster integration¶

Sciantist supports both SLURM and LSF schedulers via SSH to remote HPC systems. You define cluster profiles in YAML:

hpc-cluster:
  cluster_target: hpc-cluster
  ssh_target: user@hpc.example.edu
  scheduler: slurm
  submit_extra_args: "--gres=gpu:4"

Jobs are submitted, polled at configurable intervals, and their W&B run snapshots are automatically extracted (with retry logic for transient API failures).

Why this matters¶

Research throughput is often limited by human bottlenecks — not creativity, but the mechanical cycle of implement → train → evaluate → decide. By automating that cycle, Sciantist lets you focus on the high-level direction while the system explores the solution space in parallel.

It's not replacing researchers — it's giving them a tireless assistant that runs experiments 24/7, on a cluster, with 8 different perspectives, and remembers everything.

::: note The code is open source under the Apache 2.0 license. It requires Python 3.12+, an OpenAI-compatible LLM API, and optionally SSH access to a SLURM/LSF cluster. :::