mlc-patched.py - Custom Image Support Strategy
Date: 2025-11-12 Purpose: Document the minimal patch to AIME v2's mlc.py to support DS01 custom images
Problem Statement
AIME v2's mlc.py CANNOT accept custom images:
- Lines 1534-1612: Framework/version MUST exist in ml_images.repo catalog
- No mechanism to bypass catalog lookup
- Custom images (built with DS01's
image-create) cannot be used
DS01 Requirement:
- Users build custom images:
FROM aimehub/pytorch... + RUN pip install packages - Container creation needs to use these custom images, not catalog images
- Must preserve AIME's container setup (UID/GID matching, labels, etc.)
Solution: mlc-patched.py
Approach: Create a MINIMALLY modified version of mlc.py that:
- Accepts a new
--imageflag for custom images - Bypasses catalog lookup when custom image provided
- Preserves ALL other AIME v2 logic (95%+ unchanged)
- Maintains compatibility with AIME's ecosystem
Why Patch vs Wrapper:
- mlc.py is 2,400 lines of sophisticated Python logic
- Includes: user creation, volume management, GPU detection, interactive mode
- Rewriting this in a wrapper = 2000+ lines of duplicated code
- Patching ~50 lines = 2% change, 98% AIME reuse
Patch Specification
Change 1: Add --image Argument
File: mlc-patched.py Line: ~100 (in parser_create section)
# EXISTING:
parser_create.add_argument('framework', nargs='?', type=str, help='Name of the framework (Pytorch, Tensorflow).')
parser_create.add_argument('version', nargs='?', type=str, help='Version of the framework.')
# ADD:
parser_create.add_argument(
'--image',
type=str,
default=None,
help='Custom Docker image to use (bypasses catalog lookup). Image must exist locally.'
)
Change 2: Modify Image Resolution Logic
File: mlc-patched.py Lines: ~1534-1612 (framework/version validation section)
# ADD at line ~1533 (before "Extract framework, version and docker image from the ml_images.repo file"):
# ========== DS01 PATCH: Custom Image Support ==========
if args.image:
# Custom image provided - bypass catalog lookup
selected_docker_image = args.image
# Validate image exists locally
try:
result = subprocess.run(
['docker', 'image', 'inspect', selected_docker_image],
capture_output=True,
text=True
)
if result.returncode != 0:
print(f"\n{ERROR}Custom image not found:{RESET} {INPUT}{selected_docker_image}{RESET}")
print(f"{HINT}Build it first with: image-create{RESET}\n")
exit(1)
except Exception as e:
print(f"\n{ERROR}Error checking custom image:{RESET} {e}\n")
exit(1)
# For custom images, framework/version are optional (labels only)
selected_framework = args.framework or "custom"
selected_version = args.version or "latest"
print(f"\n{INFO}Using custom image:{RESET} {INPUT}{selected_docker_image}{RESET}")
print(f"{NEUTRAL}Framework: {selected_framework}, Version: {selected_version}{RESET}\n")
# Skip catalog lookup - jump directly to container name validation
validated_container_name, validated_container_tag = get_container_name(args.container_name, user_name, args.command, args.script)
# Skip to workspace selection (line ~1618)
# ... (rest of script continues normally)
else:
# ORIGINAL AIME LOGIC: Use catalog
# Extract framework, version and docker image from the ml_images.repo file
framework_version_docker = extract_from_ml_images(repo_file, architecture)
# ... (existing code continues)
# ========== END DS01 PATCH ==========
Change 3: Update build_docker_create_command Call
File: mlc-patched.py Line: ~1850 (where docker create command is built)
No changes needed - function already accepts selected_docker_image parameter!
Change 4: Add DS01 Label
File: mlc-patched.py Line: ~1425 (in build_docker_create_command)
# EXISTING labels:
'--label', f'{container_label}.FRAMEWORK={selected_framework}-{selected_version}',
'--label', f'{container_label}.GPUS={num_gpus}',
# ADD:
'--label', f'{container_label}.DS01_MANAGED=true',
'--label', f'{container_label}.CUSTOM_IMAGE={selected_docker_image if "--image" in sys.argv else ""}',
Patch Summary
Total Changes:
- ~15 lines for --image argument
- ~35 lines for custom image logic
- ~2 lines for DS01 labels
- Total: ~52 lines added to 2,400-line script = 2.2% change
Preserved:
- 100% of AIME's user creation logic
- 100% of volume mounting logic
- 100% of GPU detection logic
- 100% of interactive mode
- 100% of label system
- 100% of container lifecycle
Usage Examples
AIME Catalog Workflow (unchanged)
mlc-patched create my-project pytorch 2.7.1
# Works exactly like mlc create
DS01 Custom Image Workflow (NEW)
# 1. Build custom image (DS01's image-create)
image-create my-cv-project -f pytorch -t cv
# Result: my-cv-project-john (FROM aimehub/pytorch + custom packages)
# 2. Create container from custom image
mlc-patched create my-cv-project pytorch --image=my-cv-project-john
# Uses custom image, bypasses catalog, applies all AIME setup
DS01 Wrapper Integration
# mlc-create-wrapper.sh detects custom image
if [ -n "$CUSTOM_IMAGE" ]; then
mlc-patched create $NAME $FRAMEWORK --image=$CUSTOM_IMAGE \
$RESOURCE_LIMITS $GPU_ARGS
else
mlc-patched create $NAME $FRAMEWORK $VERSION \
$RESOURCE_LIMITS $GPU_ARGS
fi
Testing Plan
Test 1: AIME Catalog (Compatibility)
mlc-patched create test1 pytorch 2.7.1
# Expected: Identical to mlc create
# Verify: docker inspect shows aimehub/pytorch image
Test 2: Custom Image
image-create test-img -f pytorch
mlc-patched create test2 pytorch --image=test-img-john
# Expected: Container created from custom image
# Verify: docker inspect shows test-img-john
Test 3: Custom Image Not Found
mlc-patched create test3 pytorch --image=nonexistent
# Expected: Error message + exit
Test 4: Labels Preserved
docker inspect test2._.1001 --format '{{json .Config.Labels}}' | jq
# Expected: All aime.mlc.* labels + DS01_MANAGED=true
Test 5: mlc open Compatibility
mlc open test2
# Expected: Works with patched containers
File Structure
/opt/ds01-infra/
├── aime-ml-containers/ # Untouched AIME v2 submodule
│ ├── mlc.py # Original AIME v2 (2,400 lines)
│ └── ml_images.repo # Framework catalog
├── scripts/
│ └── docker/
│ ├── mlc-patched.py # NEW: Patched version (~2,450 lines)
│ ├── mlc-create-wrapper.sh # Updated to call mlc-patched.py
│ ├── get_resource_limits.py
│ └── gpu_allocator.py
Maintenance Strategy
Updating AIME v2:
- Pull latest AIME v2 submodule:
git submodule update --remote - Diff check:
diff aime-ml-containers/mlc.py scripts/docker/mlc-patched.py - If AIME changed significantly, re-apply 50-line patch to new version
- Test all 5 test cases above
Upstreaming to AIME:
- Custom image support could be contributed back to AIME
- Add
--imageflag as optional feature - Maintain backward compatibility (flag is optional)
- Would benefit other AIME users wanting custom packages
Decision: Proceed with mlc-patched.py
Rationale:
- ✓ Minimal change (2.2% of codebase)
- ✓ Maximum AIME reuse (97.8%)
- ✓ Maintainable (50 lines to sync on updates)
- ✓ Clean separation (AIME submodule untouched)
- ✓ Backward compatible (catalog workflow unchanged)
- ✓ No duplication (vs 2000+ line wrapper)
Next Steps:
- Create mlc-patched.py
- Test compatibility with AIME catalog
- Test custom image workflow
- Update mlc-create-wrapper.sh
- Document in INTEGRATION_STRATEGY_v2.md