Workspaces & Persistence
Deep dive into stateless compute, persistent storage, and cloud architecture patterns.
Part of Educational Computing Context - Career-relevant knowledge beyond DS01 basics.
Just want the essentials? See Key Concepts: Workspaces and Persistence for a shorter overview.
Understanding stateless/stateful separation is critical for cloud computing. This guide explains persistence patterns, file organisation, and how these concepts transfer to AWS, Kubernetes, and production systems.
The Golden Rule
Everything in /workspace (inside container) = Safe and permanent
Everything else = Temporary and will be lost
What's Persistent vs Ephemeral
Persistent (Always Safe) ✓
| Location | What It Is | Survives Container Removal? |
|---|---|---|
~/workspace/<project>/ | Your code, data, results | ✓ Yes |
~/dockerfiles/ | Image blueprints | ✓ Yes |
| Docker images | Environment blueprints | ✓ Yes |
~/.ssh/ | SSH keys | ✓ Yes |
Ephemeral (Temporary) ✗
| Location | What It Is | Survives Container Removal? |
|---|---|---|
| Container instance | Running container | ✗ No |
/tmp (in container) | Temporary files | ✗ No |
/home/<user> (in container, outside /workspace) | Container home dir | ✗ No |
| Container processes | Running Python, Jupyter, etc. | ✗ No |
| GPU allocation | Assigned GPU | ✗ No |
Understanding DS01 File Locations
On the Host (DS01 Server)
/home/your-username/ # Your home directory
├── workspace/ # ← PERSISTENT: Your projects
│ ├── project-1/
│ │ ├── data/
│ │ ├── notebooks/
│ │ ├── models/
│ │ └── README.md
│ ├── project-2/
│ └── experiment-3/
├── dockerfiles/ # ← PERSISTENT: Image blueprints
│ ├── project-1.Dockerfile
│ └── project-2.Dockerfile
└── .ssh/ # ← PERSISTENT: SSH keys
├── id_ed25519
└── id_ed25519.pub
Inside a Container
/ # Container root
├── workspace/ # ← MOUNTED from ~/workspace/<container-name>/
│ ├── data/ # Files here = PERSISTENT
│ ├── notebooks/
│ └── train.py
├── tmp/ # ← EPHEMERAL: Deleted on container removal
├── home/
│ └── your-username/ # ← EPHEMERAL (except /workspace mount)
└── opt/
└── conda/ # ← EPHEMERAL (but can rebuild from image)
Key insight: /workspace in container is actually ~/workspace/<container-name>/ on host, mounted into the container. DS01 containers are designed to be project-associated - each container maps to one project directory by default.
How Workspace Mounting Works
The Mount
When you create a container:
container-deploy my-project
DS01 automatically runs (internally):
docker run \
-v ~/workspace/my-project:/workspace \ # ← Project dir → /workspace
...
This means:
- Files you save to
/workspace/(inside container) - Actually saved to
~/workspace/my-project/(on host) - Survive container removal
Why project-specific? DS01's mental model is that containers are project-associated. This keeps projects isolated and encourages good organisation. If you need different behaviour, use the --workspace flag:
container-create my-container --workspace ~/workspace # Mount all projects
container-create my-container --workspace ~/other/path # Custom location
Visualisation
┌─────────────────────────────────────────┐
│ DS01 Host (Persistent) │
│ │
│ ~/workspace/my-project/ │
│ ├── data/ │
│ ├── models/ │
│ └── train.py │
│ ↕ │
│ (mounted as /workspace) │
│ ↕ │
│ ┌────────────────────────────────┐ │
│ │ Container (Ephemeral) │ │
│ │ │ │
│ │ /workspace/ ← mounted │ │
│ │ ├── data/ │ │
│ │ ├── models/ │ │
│ │ └── train.py │ │
│ │ │ │
│ │ /tmp/ ← NOT mounted │ │
│ │ └── temp.txt ✗ LOST │ │
│ └────────────────────────────────┘ │
└─────────────────────────────────────────┘
Best Practices for File Organisation
Recommended Structure
~/workspace/<project>/
├── README.md # Project documentation
├── requirements.txt # Python packages (for Dockerfile)
├── .gitignore # Git ignored files
├── .keep-alive # Prevent auto-stop (optional)
│
├── data/ # Datasets
│ ├── raw/ # Original data
│ ├── processed/ # Cleaned data
│ └── README.md # Data documentation
│
├── notebooks/ # Jupyter notebooks
│ ├── 01-exploration.ipynb
│ ├── 02-training.ipynb
│ └── 03-analysis.ipynb
│
├── src/ # Source code
│ ├── __init__.py
│ ├── data.py # Data loading
│ ├── model.py # Model definition
│ └── train.py # Training script
│
├── models/ # Trained models
│ ├── checkpoint-001.pt
│ ├── checkpoint-002.pt
│ └── best-model.pt
│
├── results/ # Experiment outputs
│ ├── metrics.json
│ ├── plots/
│ └── logs/
│
└── tests/ # Unit tests
└── test_model.py
Why This Structure?
Separation of concerns:
data/: Inputsrc/: Codemodels/: Outputs (trained weights)results/: Analysisnotebooks/: Exploration
Reproducibility:
README.md: How to userequirements.txt: What packages neededsrc/: Reusable codetests/: Verify correctness
Collaboration:
- Clear organisation
- Easy to navigate
- Standard structure
Working Inside Containers
Always Start in Workspace
# Inside container - you should already be here by default
alice@my-project:~$ pwd
/workspace
# If not, go there
cd /workspace
Save Everything to Workspace
Good:
# Save model to workspace
torch.save(model.state_dict(), '/workspace/models/model.pt')
# Log to workspace
with open('/workspace/results/log.txt', 'a') as f:
f.write(f'Epoch {epoch}: Loss {loss}\n')
# Cache to workspace
cache_dir = '/workspace/.cache'
Bad:
# DON'T save to /tmp
torch.save(model.state_dict(), '/tmp/model.pt') # ✗ LOST on container removal
# DON'T save to home (outside workspace)
torch.save(model.state_dict(), '~/model.pt') # ✗ LOST
# DON'T save to root
torch.save(model.state_dict(), '/model.pt') # ✗ LOST
Environment Variables for Common Paths
Set in your code or shell:
# Inside container
export DATA_DIR="/workspace/data"
export MODEL_DIR="/workspace/models"
export RESULTS_DIR="/workspace/results"
# Use in Python
import os
data_dir = os.environ['DATA_DIR']
Docker Images vs Containers vs Workspaces
Three Layers of Persistence
1. Docker Image (Persistent Blueprint)
- Contains: OS, Python, packages, Dockerfile instructions
- Created with:
image-createordocker build - Survives: Container removal, system reboot
- Location: Docker storage (
/var/lib/docker/) - Purpose: Environment reproducibility
2. Container Instance (Ephemeral)
- Contains: Running processes, writable filesystem layer
- Created with:
container-deployordocker run - Survives: Stop/start (unless removed)
- Does NOT survive:
container-retireordocker rm - Purpose: Temporary compute environment
3. Workspace (Persistent Data)
- Contains: Your code, data, results
- Created with:
mkdir ~/workspace/<project>(or viaproject init) - Survives: Everything (container removal, image deletion, reboots)
- Location:
~/workspace/<project>/on host →/workspace/in container - Purpose: Permanent storage
Lifecycle Example
# Setup
image-create # Create image (PERSISTENT)
container-deploy my-project # Create container (EPHEMERAL)
cd /workspace
echo "Hello" > file.txt # Save to workspace (PERSISTENT)
exit
# Done with GPU work
container-retire my-project # Container DELETED
# Image STILL EXISTS
# Workspace STILL EXISTS
# Later: Need GPU again
container-deploy my-project # New container from same image
cd /workspace
cat file.txt # "Hello" - file persists!
Common Scenarios
Scenario 1: Container Crashed
What happens:
- Container stops unexpectedly
- GPU released
- Container instance still exists (stopped state)
Your files:
- ✓ Workspace files: Safe
- ✓ Image: Safe
- ✗ Running processes: Terminated
- ✗ Unsaved work (RAM): Lost
Recovery:
# Restart container
container-start my-project
# Or
container-run my-project
# Your files are there
ls /workspace
Scenario 2: Container Retired
What happens:
- Container stopped
- Container removed
- GPU released
Your files:
- ✓ Workspace files: Safe
- ✓ Image: Safe
- ✗ Container instance: Deleted
Recovery:
# Recreate from same image
container-deploy my-project
# Same environment, same files
Scenario 3: Image Deleted
What happens:
- Image removed from Docker storage
- Containers from this image can't be created
Your files:
- ✓ Workspace files: Safe
- ✗ Image: Deleted
- ✗ Packages installed: Need to reinstall
Recovery:
# Rebuild image
image-create # Reinstall packages
# Or use base image temporarily
container-deploy my-project --framework pytorch
# Your workspace files unaffected
Scenario 4: Accidentally Deleted Workspace
What happens:
- Workspace directory deleted
- Your code, data, results LOST
Your files:
- ✗ Workspace files: DELETED
- ✓ Image: Safe (can recreate environment)
- ✗ Data: Lost (unless in Git or backups)
Prevention:
# Use Git for code
cd ~/workspace/my-project
git init
git remote add origin <your-repo>
git push
# Backup data regularly
rsync -avz ~/workspace/my-project/ backup-location/
# Don't run rm -rf in workspace!
Backup Strategies
1. Version Control (Git)
For code:
cd ~/workspace/my-project
git init
git add src/ notebooks/ README.md requirements.txt
git commit -m "Initial commit"
git remote add origin git@github.com:your-username/project.git
git push -u origin main
Advantages:
- Version history
- Collaboration
- Remote backup
- Reproducibility
What to commit:
- ✓ Source code (
src/) - ✓ Notebooks (
notebooks/) - ✓ Documentation (
README.md) - ✓ Configuration (
requirements.txt, configs) - ✗ Data files (too large)
- ✗ Model weights (too large)
- ✗ Results (generated)
2. Data Storage
For datasets:
- Store on DS01's shared data directory (if available)
- Download from source (reproducible)
- Use dataset management tools (DVC, LFS)
Don't:
- Commit large datasets to Git (slow, bloated)
- Store only in container (temporary)
3. Model Checkpoints
During training:
# Save checkpoints periodically
for epoch in range(epochs):
train(...)
if epoch % 10 == 0:
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
}, f'/workspace/models/checkpoint-{epoch:03d}.pt')
Final models:
- Save to workspace (
/workspace/models/) - Copy to permanent storage
- Upload to model registry (Hugging Face Hub, etc.)
4. Results and Logs
Experiment tracking:
- Weights & Biases, MLflow, TensorBoard
- Automatically backs up metrics to cloud
- Survives workspace deletion
import wandb
wandb.init(project="my-project")
wandb.log({"loss": loss, "accuracy": acc})
# Logged to cloud, safe even if workspace deleted
Monitoring Disk Usage
Check Workspace Size
# Total size of all projects
du -sh ~/workspace
# Size of each project
du -sh ~/workspace/*
# Detailed breakdown
du -h ~/workspace/my-project | sort -hr | head -20
Find Large Files
# Files over 1GB
find ~/workspace -type f -size +1G -exec ls -lh {} \;
# Largest files
find ~/workspace -type f -exec ls -s {} \; | sort -nr | head -20
Clean Up
# Remove temporary files
rm -rf ~/workspace/*/tmp
rm -f ~/workspace/*/*.tmp
# Remove old checkpoints
find ~/workspace/models -name "checkpoint-*.pt" -mtime +30 -delete
# Clean Python cache
find ~/workspace -type d -name __pycache__ -exec rm -rf {} +
Check Your Quota
# Your disk quota (if enforced)
quota -s
# Total disk usage
df -h | grep home
Docker Image Storage
Images Take Disk Space
# List images with sizes
docker images
# Total size
docker system df
# Detailed breakdown
docker system df -v
Clean Up Old Images
# Remove unused images
docker image prune
# Remove all unused (careful!)
docker image prune -a
# Remove specific image
docker rmi ds01-$(whoami)/old-project:latest
Troubleshooting
"Where are my files?"
Check both locations:
# On host
ls ~/workspace/<project-name>/
# Inside container (should match) - replace <project-name>
docker exec <project-name>._.$(whoami) ls /workspace/
"Files disappeared after container removal"
Likely saved outside workspace:
# Check if files in workspace
ls ~/workspace/<project-name>/
# If empty, check container (if still exists) - replace <project-name>
docker exec <project-name>._.$(whoami) ls /tmp/
docker exec <project-name>._.$(whoami) ls ~/
Prevention: Always save to /workspace
"Can't write to workspace"
Check permissions:
# On host
ls -ld ~/workspace/my-project/
# Should be owned by you
# If not, fix:
sudo chown -R $(whoami):$(whoami) ~/workspace/my-project/
"Out of disk space"
Check usage:
du -sh ~/workspace/* # Workspace
docker system df # Images/containers
# Clean up
rm -rf ~/workspace/old-project/
docker image prune
Best Practices Summary
✓ Do This
- Save everything to
/workspace - Use Git for code (push regularly)
- Organise projects (data/, src/, models/, etc.)
- Save checkpoints frequently
- Clean up old data periodically
- Use experiment tracking (W&B, MLflow)
✗ Avoid This
- Don't save to
/tmpor~(outside workspace) - Don't commit large files to Git
- Don't leave containers running with unsaved work
- Don't delete workspace without backups
- Don't fill disk - clean up regularly
Next Steps
Understand Containers
Learn how containers work:
Learn Daily Workflow
Put knowledge into practice:
Advanced Organisation
Project structure:
Next Steps
- Daily Usage Patterns - Start working
- Ephemeral Containers - Understand the philosophy