Skip to main content

Custom Environments

Build and customise Docker images for your specific research needs.

Key concept: DS01 containers ARE your Python environment - you don't need venv or conda inside containers. See Python Environments in Containers for why.


Two Approaches

DS01 supports two ways to customise your environment:

ApproachSpeedReproducibilityVersion ControlledBest For
Edit DockerfileSlower (rebuild)✓ Full✓ YesResearch, collaboration, long-term projects
Quick installFast (pip in running container)*PartialLimitedQuick experiments, testing packages

The '*Quick Install' method involves adding new pkgs to your running container using pip/apt, then saving the pkg changes back into your image when you retire the container. This is offered by default in the container retiree workflow, which uses docker commit under the hood. This is fine for adding new packages, but not recommended for ongoing projects.

Recommendation: Use Dockerfiles for research work - they're shareable, reproducible, and version-controlled. image update will handle this for you.


Best Practice

Simplest, most robust:

Keep `requirements.txt` file up to date with any new pkg requirements, then use `image update` CLI to scan that and update.

DS01 has developed this CLI for you that handles all of the below on your behalf.

The below, is just what is being done under the hood, or if you want more granular control.

Best practice for reproducible research.

Where Dockerfiles Live

~/workspace/my-project/Dockerfile

Each project has its own Dockerfile - version controlled with your code.

Step 1: Edit Dockerfile

# Edit your project's Dockerfile
vim ~/workspace/my-thesis/Dockerfile

Example: Add Transformers packages

FROM aime/pytorch:2.8.0-cuda12.4

# Install Python packages
RUN pip install --no-cache-dir \
transformers>=4.30.0 \
datasets>=2.12.0 \
accelerate>=0.20.0 \
bitsandbytes>=0.41.0

# Install system packages if needed
RUN apt-get update && apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
&& rm -rf /var/lib/apt/lists/*

# Configure Jupyter (optional)
RUN pip install jupyterlab ipywidgets

WORKDIR /workspace

Step 2: Rebuild Image

image-update my-thesis --rebuild # Rebuild without prompts
# Or use: docker build -t ds01-<uid>/my-thesis:latest ~/workspace/my-thesis/

What happens:

  • Reads ~/workspace/my-thesis/Dockerfile
  • Builds new Docker image
  • Tags as ds01-username/my-thesis:latest
  • Old image replaced

Time: 5-10 minutes (depends on packages)

Note: Most users should use image-update (interactive GUI) instead of editing Dockerfiles directly. See Option 2 below.

Step 3: Recreate Container

# Launch with new image
project launch my-thesis --open

Your workspace files are safe - only container is recreated.

Step 4: Verify Packages

# Inside container
python -c "import transformers; print(transformers.__version__)"

Quick Install Method (quick fix during experimentation)

For rapid experimentation - install packages into running container, save later.

Install in Running Container

# Launch container
project launch my-project --open

# Inside container: Install packages
pip install transformers datasets

# Use them immediately
python
>>> import transformers
>>> # Works!

Option 1: Save to Image (Quick, Non-Reproducible)

When you retire a container, DS01 automatically detects newly installed packages and offers to save them via docker commit:

# Exit container
exit

# Retire - will prompt to save if new packages detected
container retire my-project

The interactive prompt offers to commit changes to the image

What happens:

  • If new packages detected, prompts to commit container changes to image
  • New containers get these packages
  • BUT: Not in Dockerfile, not version-controlled, not shareable

Trade-offs:

  • Fast - no rebuild needed
  • Not reproducible - colleagues don't know what's installed
  • Not version-controlled - can't track changes
  • Image bloat - auto-cleanup may remove dangling layers

Best Approach:

# 1. Note what you installed
pip list | grep transformers
# transformers==4.30.2

# 2. Exit and use interactive package manager
exit
image-update # Select image, add "transformers==4.30.2"

# 3. Recreate container
project launch my-project

Trade-offs:

  • ✓ Reproducible - changes saved to Dockerfile
  • ✓ Version controlled - tracked in git
  • ✓ Shareable - colleagues can rebuild
  • ✓ User-friendly - no manual Dockerfile editing
  • ✗ Slower - requires rebuild

Option 3: Edit Dockerfile Directly (Advanced)

For advanced users who prefer manual control:

# 1. Note what you installed
pip list | grep transformers
# transformers==4.30.2

# 2. Exit and edit Dockerfile
exit
vim ~/workspace/my-project/Dockerfile
# Add: RUN pip install transformers>=4.30.0

# 3. Rebuild image
image-update my-project --rebuild
# Or: docker build -t ds01-<uid>/my-project:latest ~/workspace/my-project/

# 4. Recreate container
project launch my-project

Common Customisation Patterns

Adding ML Libraries

# Computer Vision
RUN pip install --no-cache-dir \
opencv-python \
Pillow \
albumentations \
timm

# NLP
RUN pip install --no-cache-dir \
transformers \
datasets \
tokenizers \
sentencepiece \
spacy

# General ML
RUN pip install --no-cache-dir \
scikit-learn \
xgboost \
lightgbm \
catboost

Adding System Packages

RUN apt-get update && apt-get install -y \
vim \
tmux \
htop \
git-lfs \
&& rm -rf /var/lib/apt/lists/*

Note: Always clean up apt cache (rm -rf /var/lib/apt/lists/*) to keep images small.

Installing from Requirements File

# Copy requirements into image
COPY requirements.txt /tmp/requirements.txt

# Install from file
RUN pip install --no-cache-dir -r /tmp/requirements.txt

On host, create requirements.txt:

echo "transformers>=4.30.0" > ~/workspace/my-project/requirements.txt
echo "datasets>=2.12.0" >> ~/workspace/my-project/requirements.txt

Pinning Package Versions

# Exact version (most reproducible)
RUN pip install torch==2.0.1

# Minimum version (more flexible)
RUN pip install torch>=2.0.0

# Version range
RUN pip install "torch>=2.0.0,<2.2.0"

Best practice: Use >= for research (get bug fixes), exact versions for production.

Installing from GitHub

# Install specific commit/branch
RUN pip install git+https://github.com/huggingface/transformers.git@main

# Or specific tag
RUN pip install git+https://github.com/huggingface/transformers.git@v4.30.0

Managing Multiple Environments

Each project can have different packages:

# Project 1: PyTorch + Vision
~/workspace/cv-project/
└── Dockerfile # torchvision, OpenCV, Pillow

# Project 2: TensorFlow + NLP
~/workspace/nlp-project/
└── Dockerfile # transformers, spacy

# Project 3: Reinforcement Learning
~/workspace/rl-project/
└── Dockerfile # gym, stable-baselines3

Switching is seamless:

project launch cv-project # Uses CV environment
exit
project launch nlp-project # Uses NLP environment

Viewing Current Packages

# Inside container
pip list

# Specific package
pip show transformers

# Packages not in base image
pip list --not-required

Troubleshooting

"Package not found" After Rebuild

Symptom: Added package to Dockerfile, rebuilt, but import fails in container.

Causes:

  1. Using old container - need to recreate
  2. Typo in package name
  3. Package installation failed (check build logs)

Fix:

# Always recreate after rebuild
container retire my-project
project launch my-project --open

# Check build logs if package missing
image-update my-project 2>&1 | grep -A 5 "ERROR"

"Build failed: No space left"

Cause: Docker images filling disk.

Fix:

# Remove unused images
docker image prune -a

# Remove specific old image
image-delete old-project-name

"Cannot import package installed with pip"

Inside container, you used pip install foo, but after recreating, it's gone.

Cause: Package not in Dockerfile - only existed in removed container.

Fix: Use image-update to add the package permanently:

image-update # Select image, add "foo"

Advanced: Edit Dockerfile directly:

vim ~/workspace/my-project/Dockerfile
# Add: RUN pip install foo
image-update my-project --rebuild

"Package conflict" During Build

Symptom:

ERROR: transformers 4.35.0 requires tokenizers>=0.14, but you have tokenizers 0.13.0

Fix: Specify compatible versions:

RUN pip install --no-cache-dir \
transformers>=4.35.0 \
tokenizers>=0.14.0

Or let pip resolve:

RUN pip install --no-cache-dir transformers
# Automatically installs compatible tokenizers

Base Images

DS01 provides AIME base images:

Base ImageFrameworkCUDAPython
aime/pytorch:2.8.0-cuda12.4PyTorch 2.8.012.43.11
aime/tensorflow:2.16.1-cuda12.3TensorFlow 2.16.112.33.11
aime/jax:0.4.23-cuda12.3JAX 0.4.2312.33.11

What's included in base images:

  • CUDA + cuDNN
  • Framework (PyTorch/TensorFlow/JAX)
  • Basic tools (git, vim, wget, curl)
  • Common libraries (numpy, pandas, matplotlib)

Check what's in your base:

# Inside container from fresh image
pip list
dpkg -l # System packages

Image Commands Reference

# Create new image
image-create my-project

# List your images
image-list

# Update image (interactive GUI - recommended)
image-update # Select image, add/remove packages

# Rebuild image after manual Dockerfile edit (advanced)
image-update my-project --rebuild

# Delete image
image-delete old-project

Quick package update:

# Add packages to image (updates Dockerfile and rebuilds)
image-update my-project --add "transformers datasets"

Best Practices

1. Keep Dockerfiles in Version Control

cd ~/workspace/my-project
git add Dockerfile
git commit -m "Add transformers for BERT experiments"
git push

Benefits:

  • Colleagues can reproduce your environment
  • Track what changed over time
  • Rollback if package breaks something

2. Comment Your Dockerfile

# Fine-tuning BERT models
RUN pip install transformers datasets

# Data augmentation for images
RUN pip install albumentations

# Experiment tracking
RUN pip install wandb tensorboard

Future you will thank present you.

# Bad - many RUN commands (slow, large image)
RUN pip install transformers
RUN pip install datasets
RUN pip install tokenizers

# Good - one RUN command (fast, smaller image)
RUN pip install --no-cache-dir \
transformers \
datasets \
tokenizers

4. Use --no-cache-dir

RUN pip install --no-cache-dir transformers

Why: Saves ~50MB per image - pip cache not needed in container.

5. Clean Up After apt-get

RUN apt-get update && apt-get install -y \
package1 \
package2 \
&& rm -rf /var/lib/apt/lists/*

Why: Removes package cache, keeps image smaller.

6. Test Locally First

# Install in running container
project launch test-env --open
pip install some-new-package

# Test it works
python -c "import some_new_package"

# Then add to Dockerfile
exit
vim ~/workspace/test-env/Dockerfile

Workflow Comparison

Dockerfile Workflow (Reproducible)

Edit Dockerfile → Rebuild image → Recreate container → Use new packages

Time: 10-15 minutes total

When: Research work, collaboration, long-term projects

Quick Install Workflow (Fast)

Install in container → Save to image → Done

Time: 1 minute

When: Quick experiments, testing packages, one-off needs

Convert to Dockerfile later:

# After quick install testing, make it permanent
pip list | grep package-name # Check version
image-update # Add package to image via GUI

Advanced Topics

Multi-Stage Builds

Build tools in one stage, copy results to another (smaller final image):

# Build stage
FROM aime/pytorch:2.8.0-cuda12.4 AS builder
RUN pip install --no-cache-dir transformers
RUN pip download --dest /packages some-large-package

# Final stage
FROM aime/pytorch:2.8.0-cuda12.4
COPY --from=builder /packages /packages
RUN pip install --no-cache-dir /packages/*

Rarely needed for DS01 - only if building from source.

Environment Variables

# Set environment variables
ENV TRANSFORMERS_CACHE=/workspace/.cache/transformers
ENV HF_HOME=/workspace/.cache/huggingface

# Use in Python
# Cache stored in workspace (persistent)

Custom Jupyter Extensions

# Install Jupyter extensions
RUN pip install jupyterlab ipywidgets jupyter-book

# Enable extensions
RUN jupyter nbextension enable --py widgetsnbextension
RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager

Next Steps

Launch with your custom environment:

project launch my-project --open

Learn more Docker:

Set up development tools:

Understand the concepts: