Self Serve Labs transformed Ciscoβs internal lab infrastructure into a modern, automated platform serving over 1000 engineers globally. Built from scratch over 5 years, this enterprise-scale system revolutionized how network engineers access and manage complex lab environments.
A comprehensive web platform providing Netflix-like browsing of lab topologies with automated provisioning, real-time console access, and intelligent resource scheduling.
π Core Codebase Statistics:
βββ 179 Python modules (excluding migrations)
βββ 36,583 lines of Python code (production code only)
βββ 72 HTML templates (responsive web interface)
βββ 79 JavaScript files (frontend functionality)
βββ 33 CSS stylesheets (custom styling)
π Development Scale:
βββ 1,791 total commits over 5-year development cycle
βββ 1,633 commits by lead engineer (91% individual contribution)
βββ 151 feature-related commits (new functionality)
βββ 5-year active development (2019-2024)
π Git History Insights:
βββ Peak development years: 2020 (546 commits), 2019 (437 commits)
βββ Sustained maintenance: 368 commits (2022), 253 commits (2021)
βββ Recent enhancements: 186 commits (2023), active through 2024
βββ Multi-contributor project: 7 developers with primary ownership
ποΈ Architecture Complexity:
βββ 8 major Django applications with interconnected business logic
βββ 50+ database models with complex relationships
βββ 200+ API endpoints with authentication and permissions
βββ 100+ background task definitions for distributed processing
βββ Multiple integration points: VMware, NetBox, Guacamole, Redis, PostgreSQL
π» Technology Diversity:
βββ Python (36,583 LOC) - Backend logic, automation, API
βββ JavaScript (79 files) - Real-time UI, WebSocket handling
βββ HTML/Templates (72 files) - Responsive web interface
βββ CSS (33 files) - Custom styling and responsive design
βββ SQL (1 file) - Database schema and complex queries
β
Quality Metrics:
βββ Comprehensive error handling with Sentry integration
βββ Production deployment serving real enterprise users
βββ Session recording and audit trail capabilities
βββ Multi-tenant security with role-based access control
βββ High availability Redis Sentinel cluster architecture
βββ device/ # Physical & virtual device management
β βββ models.py # Complex device relationships & state management
β βββ tasks.py # Distributed automation tasks via Huey
β βββ nornir/ # Network automation inventory integration
βββ topology/ # Lab topology definitions & scheduling
β βββ models.py # PostgreSQL range queries for time-based resources
β βββ views.py # Complex reservation conflict resolution
βββ reservation/ # Booking system with real-time features
β βββ consumers.py # WebSocket consumers for console proxy
β βββ tasks.py # Background provisioning pipeline
βββ vmware_integration/
βββ vsphere.py # vCenter API automation & connection pooling
βββ tasks.py # VM lifecycle management
-- Complex time-based resource scheduling with PostgreSQL ranges
CREATE TABLE reservation (
time_range tstzrange NOT NULL,
topology_id INTEGER REFERENCES topology,
EXCLUDE USING gist (topology_id WITH =, time_range WITH &&)
);
-- Device relationship modeling for physical/virtual infrastructure
CREATE TABLE topology_device (
device_ptr_id INTEGER PRIMARY KEY,
topology_id INTEGER REFERENCES topology,
apply_customizations BOOLEAN DEFAULT true,
reboot_seconds INTEGER DEFAULT 300
);
class ReservationConsoleConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
# Guacamole protocol implementation
self.guac_client = GuacamoleClient(
host=settings.GUACD_HOSTS[0], # Load balanced
port=settings.GUACD_PORT,
)
async def console_message(self, event):
# Real-time session recording with typescript conversion
await self.record_session_data(event['message'])
await self.send_json(event['message'])
class TerminalServerPool:
def __init__(self):
self.connections = {}
self.connection_locks = defaultdict(asyncio.Lock)
async def get_connection(self, terminal_server, port):
conn_key = f"{terminal_server.hostname}:{port}"
async with self.connection_locks[conn_key]:
if conn_key not in self.connections:
# Netmiko connection with custom session handling
connection = ConnectHandler(
device_type=terminal_server.device_type,
host=terminal_server.console_ip,
username=terminal_server.username,
password=decrypt(terminal_server.password),
timeout=30,
read_timeout_override=device.reboot_seconds or 300
)
self.connections[conn_key] = connection
return self.connections[conn_key]
@task()
def provision_reservation(reservation_id):
reservation = Reservation.objects.get(id=reservation_id)
# Pipeline with error handling and rollback
tasks = [
clone_vms_if_needed.s(reservation_id),
configure_network_vlans.s(),
setup_console_access.s(),
run_device_automation.s(),
send_ready_notification.s()
]
# Execute with result tracking
job = group(tasks)()
return job.get(propagate=True)
Production Stack:
βββ Backend: Django 3.2 + PostgreSQL + Redis Sentinel cluster
βββ Async Processing: Huey distributed task queue with Redis backend
βββ Real-time: ASGI/Channels WebSocket consumers for console proxy
βββ Integrations: VMware vSphere, NetBox, Guacamole, Webex Teams
βββ Automation: Netmiko 4.x for network device orchestration
class VsphereConnection:
def __init__(self, config):
self.si = SmartConnect(
host=config['HOST'],
user=config['USER'],
pwd=config['PASS'],
port=config['PORT'],
sslContext=ssl_context
)
def clone_vm_if_needed(self, device):
# Complex VM customization with network reconfiguration
clone_spec = vim.vm.CloneSpec(
powerOn=True,
template=False,
location=vim.vm.RelocateSpec(pool=resource_pool),
config=vim.vm.ConfigSpec(
numCPUs=device.cores,
memoryMB=device.memory_gb * 1024,
deviceChange=self.build_network_devices(device)
)
)
# Async clone with progress tracking
task = template.CloneVM_Task(
folder=target_folder,
name=vm_name,
spec=clone_spec
)
return self.wait_for_task(task)
def get_available_time_slots(topology, start_date, end_date):
# Single query using PostgreSQL range operators
occupied_ranges = Reservation.objects.filter(
topology=topology,
time_range__overlap=DateTimeTZRange(start_date, end_date)
).values_list('time_range', flat=True)
# Range arithmetic for conflict-free scheduling
available_slots = subtract_ranges(
target_range=DateTimeTZRange(start_date, end_date),
occupied_ranges=occupied_ranges
)
return available_slots
@cache_memoize(timeout=300)
def get_topology_availability(topology_id, date_range):
# Complex availability calculation with caching
return calculate_pod_availability(topology_id, date_range)
Created an intuitive browsing interface allowing engineers to discover and book complex network topologies with the same ease as streaming a movie.
Implemented custom WebSocket architecture enabling engineers to access network device consoles directly through the web browser with session recording capabilities.
Built sophisticated scheduling engine using PostgreSQL range queries to prevent conflicts and maximize utilization of expensive lab equipment.
Developed multi-step automation pipeline handling VM cloning, network configuration, device setup, and notification delivery with error handling and rollback capabilities.
Self Serve Labs demonstrates enterprise-scale software engineering capabilities:
This project showcases the ability to architect, build, and maintain complex enterprise systems that deliver real business value while handling the technical challenges of scale, reliability, and integration complexity.
Your network engineers are spending 3 hours setting up a lab that should take minutes. Expensive equipment sits idle due to poor scheduling. Critical testing is delayed because of configuration errors. Meanwhile, development teams are blocked waiting for lab access, and your infrastructure ROI suffers.
βBrowse Network Topologies Like Streaming Moviesβ
Traditional lab management required deep technical knowledge and complex manual processes. Our platform created an intuitive discovery experience:
The Result: Engineers could find and book complex labs in under 2 minutes.
βAccess Any Network Device Through Your Web Browserβ
Console access traditionally required VPN connections, terminal servers, and complex authentication. Our WebSocket implementation delivered:
The Magic: Engineers could access Cisco routers, switches, and firewalls directly through the web interface with full session recording.
βFrom Lab Request to Ready-to-Use in Under 10 Minutesβ
Manual lab setup involved dozens of error-prone steps. Our distributed automation pipeline delivered:
The Power: Complete lab environments provisioned automatically with enterprise-grade reliability.
Production Components:
βββ Django 3.2 + PostgreSQL # Robust web framework with advanced DB features
βββ Redis Sentinel Cluster # High-availability caching and session management
βββ Huey Distributed Queue # Background task processing with failover
βββ Django Channels/ASGI # Real-time WebSocket communication
βββ VMware vSphere APIs # VM automation and infrastructure management
βββ Netmiko 4.x # Network device automation and configuration
βββ Guacamole Protocol # Console proxy with session recording
Scenario: Need complex multi-vendor lab for feature testing
Traditional Process:
With Self Serve Labs:
Enterprise Infrastructure Management:
Cisco Systems relied on Self Serve Labs for mission-critical network engineering operations, proving the platformβs reliability and business value in a demanding enterprise environment.
Self Serve Labs proves that complex enterprise challenges demand sophisticated technical solutions. This project demonstrates:
Self Serve Labs: Where enterprise-scale engineering meets practical business transformation.