Skip to main content

Python Environments in Containers

Why you don't need venv/conda in DS01.


The Key Insight

DS01 containers ARE your Python environment.

Each container provides complete isolation - you don't need venv, conda, or virtualenv inside containers.


Containers vs Virtual Environments

Traditional SetupDS01 Approach
Create venv/conda envContainer provides isolation
pip install in venvPackages installed at image build time
Activate environmentJust select the container's Python
Manage multiple Python versionsEach project has its own container/image
requirements.txt + manual setupDockerfile defines everything

What You DON'T Need

Inside DS01 containers, you don't need to:

  • Create virtual environments (python -m venv): The container itself is your isolated environment. There's no system Python to protect, no other projects to conflict with. Every package you install affects only this container.

  • Use conda environments (conda create): Conda's main value is managing Python versions and compiled dependencies. Your container already has a specific Python version and pre-compiled CUDA libraries. Adding conda just adds complexity.

  • Worry about environment activation: No source venv/bin/activate, no conda activate myenv. When you enter a container, you're already in the environment. python just works.

  • Manage multiple Python versions: Your image specifies the Python version. Different project needs Python 3.10 while another needs 3.11? Different images, different containers. No pyenv, no version conflicts.

  • Deal with PATH issues: No more "which python am I using?" confusion. The container has one Python at /usr/bin/python. Your packages are in the system site-packages. Everything is where you expect it.


Installing Packages

Use the interactive package manager:

image-update # Select image, add packages
container-deploy

Advanced: Edit Dockerfile directly:

vim ~/workspace/<project-name>/Dockerfile
# Add: RUN pip install transformers datasets torch
image-update <project-name> --rebuild

Benefits:

  • Packages persist across container restarts: When packages are baked into the image, they're there every time you deploy a new container. No reinstalling transformers for the 50th time.

  • Reproducible environment: Your Dockerfile is a complete specification. Run image-create on any machine with Docker and you get the identical environment - same Python version, same package versions, same CUDA setup.

  • Fast container startup: Packages are already installed in the image. Container deployment takes 30 seconds, not 10 minutes of pip install. You're working immediately.

At Runtime (Temporary)

For quick experiments:

# Inside container
pip install package-name

Or in Jupyter notebooks:

# Use %pip (Jupyter magic), not !pip
%pip install package-name

Why %pip? The %pip magic ensures the running kernel can find newly installed packages. Using !pip may require a kernel restart.

Note: Runtime installs are lost when container is removed. Add frequently-used packages to your Dockerfile.

container retire offers you the option to write newly-installed pkgs back into the image. This is only a half-way house, as the underlying Dockerfile remains unchanged.


Selecting Python in VS Code

When working with notebooks in VS Code:

  1. Click "Select Kernel" or the kernel indicator
  2. Choose "Python Environments"
  3. Select /usr/bin/python

Note: You may see both /usr/bin/python and /bin/python listed - they're identical (symlinks). Either works.


Troubleshooting

"Module not found" after pip install

In Jupyter:

  • If you used !pip install, restart the kernel
  • Better: use %pip install next time

In terminal:

  • Verify you're inside the container, not on the host
  • Check with which python - should be /usr/bin/python

Package installed but not importable

Check you're in the right environment:

# Inside container
which python # Should be /usr/bin/python
pip list | grep <package-name>

Kernel won't connect (VS Code)

  • Reload VS Code window: Ctrl+Shift+P → "Developer: Reload Window"
  • Check Jupyter output: Ctrl+Shift+P → "Jupyter: Show Output"

Why This Approach?

Industry Standard

This is how production ML works:

  • Docker: Standard for ML deployment
  • Kubernetes: Containers are the unit of deployment
  • Cloud ML: SageMaker, Vertex AI, etc. use containers

Reproducibility

# This Dockerfile IS your environment
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN pip install transformers==4.30.0 datasets==2.13.0

Share the Dockerfile, anyone can recreate your exact environment.

Isolation

Each project gets its own container:

  • No package conflicts between projects
  • No "it works on my machine" problems
  • Clean separation of concerns

Common Patterns

Per-Project Environments

~/workspace/
├── thesis/
│ ├── Dockerfile # PyTorch + transformers
│ └── ...
├── course-ml/
│ ├── Dockerfile # sklearn + pandas
│ └── ...
└── experiment/
├── Dockerfile # JAX + optax
└── ...

Each project has its own Dockerfile = its own environment.

Sharing Environments

Same base, different projects:

# Both projects can use same base
FROM aime/pytorch:24.09

Exact reproduction:

# Share your Dockerfile
cp ~/workspace/project/Dockerfile ~/shared/

Next Steps