Skip to main content

Command Reference

Complete reference for all DS01 commands with examples and options.


Quick Index

Container Lifecycle:

Image Management:

Project Setup:

User Setup:

System:


Container Lifecycle Commands

container-deploy

Create and start a container (L3 orchestrator)

Syntax:

container-deploy <project-name> [OPTIONS]

Options:

  • --open - Create and enter terminal immediately
  • --background - Create and start in background
  • --gpu <N> - Request N GPUs
  • --framework <name> - Specify framework (pytorch, tensorflow, jax)
  • --image <name> - Use specific image
  • --guided - Educational mode with explanations
  • --help - Show help message

Examples:

# Interactive mode
container-deploy my-project

# Create and enter
container-deploy my-project --open

# Background mode
container-deploy my-project --background

# Guided mode
container-deploy my-project --guided

What it does:

  1. Checks resource availability
  2. Runs container-create (allocates GPU, creates container)
  3. Prompts for startup mode or uses flags
  4. Runs container-start or container-run based on choice

Equivalent to:

container-create my-project
container-run my-project # or container-start

container-retire

Stop and remove a container, free GPU (L3 orchestrator)

Syntax:

container-retire <project-name> [OPTIONS]

Options:

  • -f, --force - Skip confirmation prompts
  • --save-packages - Automatically save new packages to image (no prompt)
  • --images - Also remove the Docker image after retiring
  • --dry-run - Show what would be done
  • --guided - Educational mode
  • --help - Show help message

Examples:

# Interactive mode (with confirmations)
container-retire my-project

# Skip confirmations
container-retire my-project --force

# Also remove Docker image
container-retire my-project --images

# Auto-save new packages before retiring
container-retire my-project --save-packages

What it does:

  1. Confirms you want to retire (unless --force)
  2. Detects new packages and offers to save (or auto-saves with --save-packages)
  3. Runs container-stop (stops container)
  4. Runs container-remove (removes container, frees GPU)
  5. Optionally prompts to remove Docker image (or auto-removes with --images)

Equivalent to:

container-stop my-project
container-remove my-project

container-create

Create a container with GPU allocation (L2 atomic)

Syntax:

container-create <project-name> [OPTIONS]

Options:

  • --gpu <N> - Request N GPUs (default: 1)
  • --framework <name> - Base framework
  • --image <name> - Specific Docker image
  • --guided - Educational mode
  • --help - Show help

Examples:

# Create with default settings
container-create my-project

# Request multiple GPUs
container-create my-project --gpu 2

# Use specific image
container-create my-project --image ds01-$(whoami)/custom:latest

What it does:

  1. Checks resource limits
  2. Allocates GPU via gpu_allocator.py
  3. Creates container with resource limits
  4. Mounts workspace
  5. Container remains in created/stopped state

Does not: Start the container (use container-start or container-run)


container-start

Start container in background (L2 atomic)

Syntax:

container-start <project-name>

Examples:

# Start container
container-start my-project

# Check it's running
container-list

What it does:

  1. Starts the container
  2. Container runs in background
  3. Returns immediately

To enter running container: Use container-run or docker exec


container-run

Start (if stopped) and enter container (L2 atomic)

Syntax:

container-run <project-name>

Examples:

# Enter running or start and enter stopped container
container-run my-project

# You're now inside container
user@my-project:/workspace$

What it does:

  1. If stopped: Starts container
  2. Opens interactive shell inside container
  3. You can exit with exit or Ctrl+D

Note: Container keeps running after you exit (unless you stop it)


container-pause

Freeze container processes (L2 atomic)

Syntax:

container-pause <project-name>

Examples:

# Pause container
container-pause my-project

# Processes frozen, GPU still allocated

What it does:

  1. Freezes all container processes (SIGSTOP)
  2. GPU remains allocated
  3. Memory state preserved
  4. Use container-unpause to resume

Use case: Temporarily free CPU while keeping GPU and state


container-unpause

Resume frozen container (L2 atomic)

Syntax:

container-unpause <project-name>

Examples:

# Resume paused container
container-unpause my-project

# Processes continue where they left off

What it does:

  1. Resumes all frozen processes (SIGCONT)
  2. Container continues running normally

container-stop

Stop a running container (L2 atomic)

Syntax:

container-stop [project-name] [OPTIONS]

Options:

  • -f, --force - Force stop (kill immediately)
  • -t, --timeout SECS - Timeout before force kill (default: 10)
  • -a, --all - Stop all your containers
  • -v, --verbose - Show detailed shutdown process
  • --keep-container - Don't prompt to remove container
  • --guided - Educational mode

Examples:

container-stop my-project # Graceful stop
container-stop my-project --force # Force stop immediately
container-stop --all # Stop all your containers

What it does:

  1. Stops container gracefully (SIGTERM)
  2. GPU marked as stopped (timer starts)
  3. Container still exists (can restart)

To free GPU immediately: Use container-retire instead


container-remove

Remove container and free GPU (L2 atomic)

Syntax:

container-remove [project-name] [OPTIONS]

Options:

  • -a, --all - Remove all your stopped containers
  • -i, --images - Also remove associated Docker images
  • -v, --volumes - Also remove anonymous volumes
  • -f, --force - Skip all prompts
  • --skip-removal-confirm - Skip removal confirmation only
  • --dry-run - Show what would be removed
  • --guided - Educational mode

Examples:

container-remove my-project # Remove specific container
container-remove my-project --images # Also remove Docker image
container-remove --all # Remove all stopped containers
container-remove --all --images --dry-run # Preview bulk removal

What it does:

  1. Removes container
  2. Frees GPU immediately
  3. Workspace files remain safe

Warning: Cannot be undone (but easy to recreate from image)


container-list

List your containers (L2 atomic)

Syntax:

container-list [OPTIONS]

Options:

  • --all - Include stopped containers (default: running only)

Examples:

# List running containers
container-list

# List all (including stopped)
container-list --all

# Example output:
# NAME STATUS GPU UPTIME
# my-project Running 0:1 2h 34m
# experiment-1 Running 0:2 45m

What it shows:

  • Container names
  • Status (Running, Stopped)
  • Allocated GPU
  • Uptime

container-stats

Show resource usage (L2 atomic)

Syntax:

container-stats [project-name]

Examples:

# All your containers
container-stats

# Specific container
container-stats my-project

# Example output:
# CONTAINER CPU % MEM USAGE/LIMIT MEM % GPU MEM
# my-project 245% 12.5GB / 64GB 19.5% 18.2GB

What it shows:

  • CPU usage (% across all cores)
  • Memory usage and limit
  • Memory percentage
  • GPU memory usage (if available)

Image Management Commands

image-create

Build custom Docker image (L2 atomic)

Syntax:

image-create [project-name] [OPTIONS]

Options:

  • --framework <name> - Base framework (pytorch, tensorflow, jax)
  • --guided - Educational mode
  • --help - Show help

Examples:

# Interactive mode
image-create

# Specify project and framework
image-create my-project --framework pytorch

What it does:

  1. Phase 1: Choose base framework (PyTorch, TensorFlow, JAX)
  2. Phase 2: Add Jupyter Lab and extensions
  3. Phase 3: Add data science packages
  4. Phase 4: Add custom packages (optional)
  5. Builds image and tags as ds01-<user>/<project>:latest

Time: 5-15 minutes (first build), faster with cached layers


image-list

List your Docker images (L2 atomic)

Syntax:

image-list [OPTIONS]

Options:

  • -a, --all - Show all images (not just yours)
  • -s, --size - Show image sizes
  • -d, --detailed - Show detailed info with Dockerfile locations
  • --guided - Educational mode

Examples:

image-list # List your images
image-list --all # List all images on server
image-list --detailed # Show detailed info with Dockerfile locations

# Example output:
# REPOSITORY TAG SIZE CREATED
# ds01-alice/my-project latest 8.2GB 2 days ago
# ds01-alice/experiment latest 7.9GB 1 week ago

image-update

Update existing image with package management (L2 atomic)

Syntax:

image-update [project-name] [OPTIONS]

Options:

  • --rebuild - Rebuild image without modifying Dockerfile
  • --no-cache - Force rebuild without cache
  • --add "pkg1 pkg2" - Add packages directly
  • --edit - Edit Dockerfile manually (advanced)
  • --help - Show help

Examples:

# Recommended: Interactive GUI
image-update # Select image, add/remove packages

# Advanced: After manual Dockerfile edit
image-update my-project --rebuild # Rebuild without prompts
image-update my-project --no-cache # Force complete rebuild

What it does:

  1. Without arguments: Opens interactive GUI to add/remove packages
  2. With --rebuild: Rebuilds from existing Dockerfile
  3. Uses layer cache for speed (unless --no-cache)

When to use:

  • No arguments (recommended): Add/remove packages via GUI
  • --rebuild: After manual Dockerfile edit
  • --no-cache: Force clean rebuild

image-delete

Remove Docker image (L2 atomic)

Syntax:

image-delete [image-name...] [OPTIONS]

Options:

  • --all - Delete all your images (with confirmation)
  • -f, --force - Force removal (stop/remove containers first)
  • --keep-dockerfile - Don't delete the associated Dockerfile
  • --guided - Educational mode

Examples:

image-delete my-project # Remove image (with confirmation)
image-delete img1 img2 img3 # Bulk delete multiple images
image-delete --all # Remove all your images
image-delete my-project --force # Force remove (stops containers first)
image-delete my-project --keep-dockerfile # Keep Dockerfile for later rebuild

Warning: Containers using this image must be removed first (or use --force)


Project Setup Commands

project-init

Complete project initialisation (L4 wizard)

Syntax:

project-init [project-name] [OPTIONS]

Options:

  • --guided - Educational mode
  • --help - Show help

Examples:

# Interactive mode
project-init

# Specify project name
project-init my-research

What it does:

  1. dir-create - Creates ~/workspace/<project>/
  2. git-init - Initializes Git repository
  3. readme-create - Generates README.md
  4. image-create - Builds custom Docker image
  5. container-deploy - Deploys first container

Equivalent to running all those commands individually


project-launch

Launch container for existing project (L4 wizard)

Syntax:

project-launch [project-name] [OPTIONS]

Options:

  • --guided - Educational mode
  • --open - Start and enter terminal (skip prompt)
  • --background - Start in background (skip prompt)
  • --rebuild - Force rebuild image even if exists
  • --help - Show help

Examples:

# Interactive mode (select from project list)
project-launch

# Launch specific project
project-launch my-thesis

# Launch and enter terminal
project-launch my-thesis --open

# Force image rebuild
project-launch my-thesis --rebuild

What it does:

  1. Shows menu of projects in ~/workspace/ (if no name given)
  2. Checks if Docker image exists for project
  3. If no image: runs image-create automatically
  4. Runs container-deploy to start container

Key difference from container-deploy:

  • project-launch = Smart (handles image creation automatically)
  • container-deploy = Direct (requires image to exist)

dir-create

Create workspace directory (L2 atomic)

Syntax:

dir-create <project-name>

Examples:

# Create project directory
dir-create my-project

# Created: ~/workspace/my-project/

git-init

Initialize Git repository (L2 atomic)

Syntax:

git-init <project-name>

Examples:

# Initialize Git in workspace
git-init my-project

# Creates ~/workspace/my-project/.git/
# Adds .gitignore for Python/data science

readme-create

Generate README.md (L2 atomic)

Syntax:

readme-create <project-name>

Examples:

# Create README
readme-create my-project

# Generates ~/workspace/my-project/README.md
# with project template

User Setup Commands

user-setup

Complete onboarding wizard (L4 wizard)

Syntax:

user-setup

Aliases: user setup, new-user

Examples:

# Run onboarding
user-setup

What it does:

  1. ssh-setup - Creates SSH keys
  2. project-init - Initializes first project
  3. vscode-setup - Configures VSCode Remote (optional)

Time: 15-20 minutes

See: First-Time Setup Guide


ssh-setup

Configure SSH keys (L2 atomic)

Syntax:

ssh-setup [OPTIONS]

Options:

  • --key-type <type> - Key type (default: ed25519)
  • --no-passphrase - Create without passphrase
  • --help - Show help

Examples:

# Interactive setup
ssh-setup

# Non-interactive
ssh-setup --key-type ed25519 --no-passphrase

What it does:

  1. Generates SSH key pair
  2. Displays public key (add to GitHub/GitLab)
  3. Configures SSH agent (optional)

vscode-setup

Configure VSCode Remote (L2 atomic)

Syntax:

vscode-setup [OPTIONS]

Examples:

# Interactive setup
vscode-setup

# Generates VSCode config for remote development

See: VSCode Remote Guide


System Commands

dashboard

System status dashboard

Syntax:

dashboard [SECTION] [OPTIONS]

Options:

  • --watch, -w - Watch mode (2s refresh)
  • --full - Show all sections expanded
  • --json - JSON output for scripting

Sections:

  • gpu - GPU utilization diagram
  • cpu - CPU usage by user diagram
  • system - CPU, Memory, Disk bars
  • mig-config - MIG partition configuration (only relevant when MIG is enabled)
  • containers - All containers with stats
  • users - Per-user resource summary
  • temp - GPU temperatures and power
  • allocations [N] - Recent N GPU allocations
  • alerts - Active alerts and warnings

Examples:

dashboard # Default compact view
dashboard --full # All sections expanded
dashboard --watch # Live monitoring
dashboard gpu # GPU section only
dashboard containers # Container list
dashboard alerts # Check for issues
dashboard allocations 20 # Last 20 GPU allocations

Common Patterns

Complete Workflow

# First time setup
user-setup

# Create a new project (one-time)
project init my-thesis --type=cv

# Daily workflow
project launch my-thesis --open
# Work...
exit
container retire my-thesis

# Build custom environment (manual)
image-create my-project
container-deploy my-project --open

# Multiple containers (parallel experiments)
container-deploy exp-1 --background
container-deploy exp-2 --background
container-deploy exp-3 --background

# Clean up
container-retire exp-1
container-retire exp-2
container-retire exp-3

Debugging

Replace <project-name> with your actual project name.

# Check container status
container-list
container-stats <project-name>

# View logs
docker logs <project-name>._.$(whoami)

# Enter container
container-run <project-name>

# Check GPU
nvidia-smi

Getting Help

Command Help

All commands support --help:

container-deploy --help
image-create --help

Guided Mode

Most commands support --guided:

container-deploy my-project --guided
image-create --guided

Guided mode:

  • Explains each step
  • Educational prompts
  • Recommended for beginners

Environment Variables

DS01 Variables

# System installation
DS01_INSTALL_DIR=/opt/ds01-infra

# Workspace base
DS01_WORKSPACE=~/workspace

Docker Variables

# Docker socket
DOCKER_HOST=unix:///var/run/docker.sock

# Default image registry
DOCKER_REGISTRY=<internal-registry>

Exit Codes

Standard exit codes:

  • 0 - Success
  • 1 - General error
  • 2 - Misuse of command (invalid arguments)
  • 3 - Resource unavailable (no GPUs, quota exceeded)
  • 4 - Container/image not found
  • 5 - Permission denied

Example:

container-deploy my-project
echo $? # Check exit code

Tips & Tricks

Tab Completion

Commands support tab completion:

container-de<Tab> # Completes to: container-deploy
container-deploy my-<Tab> # Completes project name (if exists)

Command History

# Search history
Ctrl+R
# Type: container-deploy

# Recall last container-deploy command
!container-deploy

Aliases

Create shortcuts in ~/.bashrc:

alias cdeploy='container-deploy'
alias cretire='container-retire'
alias clist='container-list'

Next Steps

Learn Workflows

Daily patterns:

Project setup:

Container management:

Troubleshooting

Common issues:

Best practices:

Need examples?Daily Usage Patterns

Having issues?Troubleshooting