← Back to blog

AI-Sciantist: An Autonomous Research Loop That Never Sleeps

Research is iterative. You have an idea, implement it, train it, evaluate it, decide whether to keep it, and repeat. Most of that loop is mechanical — so I automated it.

AI-Sciantist is a self-directed, closed-loop research system that runs the full experimental cycle without human intervention. Because not everyone can work 24/7.

The loop

1. IDEATE       → scan codebase, generate focused hypotheses
2. IMPLEMENT    → AI coding agent (Aider/OpenCode) writes the code
3. TRAIN        → submit SLURM/LSF job to HPC cluster
4. EVALUATE     → pull W&B metrics, compute unified score
5. DECIDE       → keep if better, revert if worse
6. ITERATE      → update memory, loop back to step 1

Each candidate experiment runs in an isolated git worktree on its own feature branch. If the metric improves, the commit is merged to the base branch. If not, it's reverted.

Multi-expert ideation

The key insight: a single LLM prompt produces generic ideas. But specialized expert personas produce focused, high-quality hypotheses. Sciantist deploys 8 experts:

Expert Focus
Hyperparameter LR, warmup, scheduler, weight decay, dropout, batch size, EMA
Architecture Backbone swaps, depth/width, normalization, attention, adapters
Loss Weighted losses, auxiliary losses, focal variants, label smoothing
Data Preprocessing, augmentation, loading strategies, generalization
GPU Utilization Throughput, mixed precision, compilation, memory
Optimizer AdamW/SGD/Lion alternatives, beta tuning, decoupled weight decay
High-Risk/High-Reward Novel, experimental ideas that challenge assumptions
Web-Research Latest papers, preprints, conference proceedings, trends

The unified metric

Different experiments optimize different metrics. Sciantist uses a configurable weighted metric composition to compare them fairly:

\[ M_{\text{unified}} = \frac{\sum_{i} w_i \cdot m_i}{\sum_{i} w_i} \]

where \(w_i\) is the weight and \(m_i\) is the value of metric \(i\). For example:

metric_weights:
  val/accuracy: 0.6
  val/f1_score: 0.4
metric_higher_is_better: true

Human-in-the-loop

The system supports live, real-time steering without restarts. You write instructions to user_prompt.md, and the system reads them at the very next ideation step:

# Start the loop
uv run scian

# In another terminal, steer it
echo "Focus on learning rate scheduling — try cosine annealing with warm restarts." \
  > config/user_prompt.md

# The next candidate will prioritize your instruction automatically

This is injected as high-priority context, overriding default ideation tendencies when there's a conflict.

Cluster integration

Sciantist supports both SLURM and LSF schedulers via SSH to remote HPC systems. You define cluster profiles in YAML:

hpc-cluster:
  cluster_target: hpc-cluster
  ssh_target: user@hpc.example.edu
  scheduler: slurm
  submit_extra_args: "--gres=gpu:4"

Jobs are submitted, polled at configurable intervals, and their W&B run snapshots are automatically extracted (with retry logic for transient API failures).

Why this matters

Research throughput is often limited by human bottlenecks — not creativity, but the mechanical cycle of implement → train → evaluate → decide. By automating that cycle, Sciantist lets you focus on the high-level direction while the system explores the solution space in parallel.

It's not replacing researchers — it's giving them a tireless assistant that runs experiments 24/7, on a cluster, with 8 different perspectives, and remembers everything.

::: note The code is open source under the Apache 2.0 license. It requires Python 3.12+, an OpenAI-compatible LLM API, and optionally SSH access to a SLURM/LSF cluster. :::