18 KiB
Story 10: Service Level Agreement (SLA) Management
Story Overview
As a service delivery manager
I want to define and track Service Level Agreements for applications
So that I can document expected service levels, maintenance windows, and responsibilities
Story Type: Feature (Service Management)
Priority: Medium
Estimated Effort: 6-8 days
Dependencies: Story 2 (Applications), Story 3 (Contacts)
Business Value
SLAs define the expected service levels, maintenance windows, and responsibilities for applications. This enables:
- Clear service level expectations
- Maintenance window planning
- Critical period identification
- Responsibility mapping
- Service delivery tracking
- Foundation for incident management
Scope
In Scope
✅ SLA CRUD operations
✅ SLA-Application association (one-to-one)
✅ Core SLA metrics (availability, response time, reaction time)
✅ Maintenance window management
✅ Critical period tracking
✅ SLA-Contact associations (responsible parties)
✅ Diagram storage (draw.io JSON format)
✅ Dependencies documentation
Out of Scope
❌ Automated SLA monitoring/compliance
❌ Incident management integration
❌ Automated alerts for SLA breaches
❌ Real-time availability tracking
❌ Historical compliance reporting
Database Schema
-- SLA entity
CREATE TABLE sla (
id UUID PRIMARY KEY,
application_id UUID UNIQUE REFERENCES application(id) ON DELETE CASCADE,
availability_percentage DECIMAL(5,2) NOT NULL, -- e.g., 99.90
min_avg_response_time_seconds INTEGER NOT NULL,
reaction_time VARCHAR(255) NOT NULL, -- Free text, e.g., "Within 4 hours for P1 incidents"
diagram_json JSONB, -- draw.io diagram in JSON format
dependencies_description TEXT,
responsibility_scope TEXT,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
CONSTRAINT check_availability CHECK (availability_percentage >= 0 AND availability_percentage <= 100),
CONSTRAINT check_response_time CHECK (min_avg_response_time_seconds > 0)
);
-- Maintenance Windows
CREATE TABLE maintenance_window (
id UUID PRIMARY KEY,
sla_id UUID NOT NULL REFERENCES sla(id) ON DELETE CASCADE,
day_of_week VARCHAR(10) NOT NULL, -- MONDAY, TUESDAY, etc.
start_time TIME NOT NULL, -- e.g., 02:00
end_time TIME NOT NULL, -- e.g., 06:00
created_at TIMESTAMP NOT NULL,
CONSTRAINT check_time_order CHECK (end_time > start_time),
CONSTRAINT check_day_of_week CHECK (day_of_week IN ('MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY'))
);
-- Critical Periods
CREATE TABLE critical_period (
id UUID PRIMARY KEY,
sla_id UUID NOT NULL REFERENCES sla(id) ON DELETE CASCADE,
start_datetime TIMESTAMP NOT NULL,
end_datetime TIMESTAMP NOT NULL,
description TEXT,
created_at TIMESTAMP NOT NULL,
CONSTRAINT check_period_order CHECK (end_datetime > start_datetime)
);
-- Computed column for duration in minutes
ALTER TABLE critical_period ADD COLUMN duration_minutes INTEGER GENERATED ALWAYS AS
(EXTRACT(EPOCH FROM (end_datetime - start_datetime)) / 60) STORED;
-- SLA-Contact junction (many-to-many)
CREATE TABLE sla_contact (
sla_id UUID NOT NULL REFERENCES sla(id) ON DELETE CASCADE,
contact_id UUID NOT NULL REFERENCES contact(id) ON DELETE CASCADE,
PRIMARY KEY (sla_id, contact_id)
);
-- Indexes
CREATE INDEX idx_sla_application ON sla(application_id);
CREATE INDEX idx_maint_window_sla ON maintenance_window(sla_id);
CREATE INDEX idx_maint_window_day ON maintenance_window(day_of_week);
CREATE INDEX idx_critical_period_sla ON critical_period(sla_id);
CREATE INDEX idx_critical_period_dates ON critical_period(start_datetime, end_datetime);
CREATE INDEX idx_sla_contact_sla ON sla_contact(sla_id);
CREATE INDEX idx_sla_contact_contact ON sla_contact(contact_id);
Key Business Logic
SLA-Application Relationship
- Each application can have at most ONE SLA (one-to-one)
- SLA is optional (not all applications require formal SLAs)
- When creating SLA, validate application doesn't already have one
- When deleting SLA, application is not affected (cascade delete from SLA side)
Maintenance Windows
- Represent recurring weekly maintenance periods
- Each window defines: day of week, start time, end time
- Multiple windows can be defined per SLA (e.g., nightly 2-4 AM every day)
- Validate end time > start time (no cross-midnight windows in MVP)
- Display in local time (time zone handling future enhancement)
Critical Periods
- Represent specific date/time ranges requiring higher service levels
- Examples: Black Friday, tax season, year-end closing
- Duration in minutes is computed automatically
- Can overlap (multiple critical periods active simultaneously)
- Historical critical periods are kept for audit
Contacts
- Multiple contacts can be responsible for an SLA
- Typical roles: Service Delivery Manager, Technical Lead, Escalation Point
Key Endpoints
SLA Management
GET /api/slas- List all SLAs (paginated)GET /api/slas/{id}- Get SLA details with all related dataGET /api/applications/{appId}/sla- Get SLA for specific applicationPOST /api/slas- Create new SLAPUT /api/slas/{id}- Update SLADELETE /api/slas/{id}- Delete SLAGET /api/slas/{id}/contacts- Get responsible contactsPOST /api/slas/{id}/contacts/{contactId}- Add contact to SLADELETE /api/slas/{id}/contacts/{contactId}- Remove contact from SLA
Maintenance Windows
GET /api/slas/{slaId}/maintenance-windows- List windows for SLAPOST /api/slas/{slaId}/maintenance-windows- Add maintenance windowPUT /api/maintenance-windows/{id}- Update maintenance windowDELETE /api/maintenance-windows/{id}- Delete maintenance window
Critical Periods
GET /api/slas/{slaId}/critical-periods- List critical periods for SLAGET /api/critical-periods/active- Get currently active critical periodsGET /api/critical-periods/upcoming?days={30}- Get upcoming critical periodsPOST /api/slas/{slaId}/critical-periods- Add critical periodPUT /api/critical-periods/{id}- Update critical periodDELETE /api/critical-periods/{id}- Delete critical period
DTOs
export enum DayOfWeek {
MONDAY = 'MONDAY',
TUESDAY = 'TUESDAY',
WEDNESDAY = 'WEDNESDAY',
THURSDAY = 'THURSDAY',
FRIDAY = 'FRIDAY',
SATURDAY = 'SATURDAY',
SUNDAY = 'SUNDAY'
}
export interface SLA {
id: string;
application: { id: string; name: string };
availabilityPercentage: number; // 99.9
minAvgResponseTimeSeconds: number; // 2
reactionTime: string; // "Within 4 hours for P1 incidents"
diagramJson?: object; // draw.io diagram as JSON
dependenciesDescription?: string;
responsibilityScope?: string;
maintenanceWindows: MaintenanceWindow[];
criticalPeriods: CriticalPeriod[];
responsibleContacts: ContactSummary[];
createdAt: Date;
updatedAt: Date;
}
export interface MaintenanceWindow {
id: string;
slaId: string;
dayOfWeek: DayOfWeek;
startTime: string; // "02:00"
endTime: string; // "06:00"
createdAt: Date;
}
export interface CriticalPeriod {
id: string;
slaId: string;
startDatetime: Date;
endDatetime: Date;
durationMinutes: number; // Computed
description?: string;
isActive: boolean; // Computed: current time between start and end
createdAt: Date;
}
export interface CreateSLARequest {
applicationId: string;
availabilityPercentage: number;
minAvgResponseTimeSeconds: number;
reactionTime: string;
diagramJson?: object;
dependenciesDescription?: string;
responsibilityScope?: string;
maintenanceWindows?: CreateMaintenanceWindowRequest[];
criticalPeriods?: CreateCriticalPeriodRequest[];
contactIds?: string[];
}
export interface UpdateSLARequest {
availabilityPercentage?: number;
minAvgResponseTimeSeconds?: number;
reactionTime?: string;
diagramJson?: object;
dependenciesDescription?: string;
responsibilityScope?: string;
}
export interface CreateMaintenanceWindowRequest {
dayOfWeek: DayOfWeek;
startTime: string; // "HH:MM"
endTime: string; // "HH:MM"
}
export interface CreateCriticalPeriodRequest {
startDatetime: Date;
endDatetime: Date;
description?: string;
}
Frontend Components
SLADetailComponent (Application Detail Tab)
- Displayed as tab in Application Detail view
- If no SLA exists: "Create SLA" button
- If SLA exists: Full SLA details display
- Sections:
- Core Metrics
- Availability: 99.9% (with visual gauge)
- Response Time: 2 seconds
- Reaction Time: "Within 4 hours for P1 incidents"
- Maintenance Windows
- Weekly calendar view showing all maintenance windows
- List view: Day, Time Range
- Add/Edit/Delete maintenance window
- Critical Periods
- Timeline view of critical periods
- Highlight currently active periods
- List: Start, End, Duration, Description
- Add/Edit/Delete critical period
- Architecture Diagram
- Render draw.io diagram (if present)
- Edit button (opens diagram editor)
- Dependencies & Scope
- Dependencies description (formatted text)
- Responsibility scope (formatted text)
- Responsible Contacts
- List of contacts with roles
- Add/Remove contacts
- Core Metrics
- "Edit SLA" and "Delete SLA" buttons
SLAFormComponent
-
Reactive form with sections:
Core Metrics:
- Application (dropdown, disabled in edit mode)
- Availability Percentage (number input, 0-100, 2 decimals)
- Min Avg Response Time (number input, seconds)
- Reaction Time (text input or textarea)
Maintenance Windows:
- Repeatable section (add/remove)
- Day of Week (dropdown)
- Start Time (time picker)
- End Time (time picker)
- Visual: Weekly calendar preview
Critical Periods:
- Repeatable section (add/remove)
- Start DateTime (datetime picker)
- End DateTime (datetime picker)
- Description (text input)
- Duration auto-computed and displayed
Documentation:
- Diagram (upload draw.io JSON or use embedded editor)
- Dependencies Description (rich text editor)
- Responsibility Scope (rich text editor)
Responsible Contacts:
- Multi-select dropdown
-
Validation:
- Availability: 0-100, required
- Response time: > 0, required
- Reaction time: required
- Maintenance window: end time > start time
- Critical period: end datetime > start datetime
- Application: must not already have SLA
-
Support create and edit modes
MaintenanceWindowCalendarComponent
- Visual representation of maintenance windows
- Weekly calendar grid (7 days × 24 hours)
- Highlight maintenance windows
- Click to add/edit window
- Different colors for different window types (if needed)
CriticalPeriodTimelineComponent
- Timeline visualization of critical periods
- Show past, current, and future periods
- Highlight active periods (green background)
- Click period to edit/view details
- Add new period button
SLAListComponent (Admin View)
- Table of all SLAs in system
- Columns: Application, Availability, Response Time, Maintenance Windows Count, Critical Periods Active, Actions
- Filter by application
- Sort by availability, response time
- Quick view of SLA summary
- Navigate to application detail
DiagramEditorComponent (Optional Integration)
- Embed draw.io editor for diagram creation
- Save diagram as JSON
- Load existing diagram for editing
- Full-screen editing mode
- Alternative: Upload draw.io file, store JSON
Acceptance Criteria
Backend
- SLA can be created for an application
- SLA creation fails if application already has SLA
- SLA validates availability percentage (0-100)
- SLA validates response time (> 0)
- SLA can be updated
- SLA can be deleted (cascade to windows and periods)
- Maintenance windows can be added to SLA
- Maintenance window validates end time > start time
- Multiple maintenance windows can exist per SLA
- Critical periods can be added to SLA
- Critical period validates end datetime > start datetime
- Critical period duration is computed correctly
- Multiple critical periods can exist per SLA
- Contacts can be added to SLA
- Contacts can be removed from SLA
- SLA details include all related entities
- Active critical periods query works correctly
- All endpoints authenticated
- Tests pass (>80% coverage)
Frontend
- User can view SLA in application detail
- If no SLA, "Create SLA" button is shown
- User can create new SLA
- Form validates all required fields
- Form validates ranges (0-100 for availability)
- Form validates time/datetime logic
- User can add multiple maintenance windows
- User can remove maintenance windows
- Weekly calendar preview shows maintenance windows
- User can add multiple critical periods
- User can remove critical periods
- Critical period duration is auto-calculated
- Currently active critical periods are highlighted
- User can upload/edit diagram
- Diagram is rendered correctly
- User can add/remove responsible contacts
- User can edit existing SLA
- User can delete SLA (with confirmation)
- All sections render correctly
- Success/error notifications displayed
- Tests pass (>70% coverage)
Testing Scenarios
Scenario 1: Create SLA with Basic Metrics
- Navigate to Application Detail → SLA tab
- Click "Create SLA"
- Fill in core metrics:
- Availability: 99.9%
- Response Time: 2 seconds
- Reaction Time: "P1: 1 hour, P2: 4 hours, P3: 1 business day"
- Skip windows/periods for now
- Click "Save"
- Verify success notification
- Verify SLA details displayed
- Verify availability gauge shows 99.9%
Scenario 2: Add Maintenance Windows
- Navigate to existing SLA
- Click "Edit"
- Add maintenance window:
- Day: Monday
- Start: 02:00
- End: 04:00
- Add another window:
- Day: Wednesday
- Start: 02:00
- End: 04:00
- Save
- Verify weekly calendar shows both windows
- Verify windows listed in detail view
Scenario 3: Add Critical Period
- Navigate to existing SLA
- Click "Edit"
- Add critical period:
- Start: 2026-11-25 00:00 (Black Friday)
- End: 2026-11-27 23:59
- Description: "Black Friday weekend sales event"
- Verify duration computed: 4,319 minutes
- Save
- Verify critical period listed
- Navigate to dashboard
- If period is active, verify highlighted
Scenario 4: Validation - Availability Range
- Create/Edit SLA
- Set availability: 101%
- Attempt to save
- Verify error: "Availability must be between 0 and 100"
- Set availability: -5%
- Verify same error
- Set availability: 99.5%
- Verify validation passes
Scenario 5: Validation - Time Order
- Edit SLA
- Add maintenance window:
- Day: Tuesday
- Start: 06:00
- End: 02:00 (before start)
- Attempt to save
- Verify error: "End time must be after start time"
Scenario 6: Prevent Duplicate SLA
- Application already has SLA
- Attempt to create another SLA for same application
- Verify error: "Application already has an SLA"
- Edit existing SLA instead
Scenario 7: Add Diagram
- Navigate to SLA detail
- Click "Edit"
- Upload draw.io diagram JSON
- Save
- Verify diagram renders in detail view
- Click "Edit Diagram"
- Modify in embedded editor
- Save changes
- Verify updated diagram displayed
Scenario 8: Manage Responsible Contacts
- Navigate to SLA detail
- Click "Add Contact"
- Select "John Doe (Service Delivery Manager)"
- Click "Add"
- Verify contact appears in list
- Add another contact: "Jane Smith (Technical Lead)"
- Verify both contacts listed
- Remove "John Doe"
- Verify only "Jane Smith" remains
Scenario 9: View Active Critical Periods
- Create critical period starting yesterday, ending tomorrow
- Navigate to dashboard or SLA list
- Verify period marked as "Active" (green highlight)
- View critical period detail
- Verify "Currently Active" badge displayed
Scenario 10: Weekly Maintenance Calendar View
- Navigate to SLA detail
- View maintenance windows section
- Verify weekly calendar displayed
- Verify all windows highlighted on calendar
- Hover over window
- Verify tooltip shows time range
- Click on calendar day/time
- Verify opens "Add Maintenance Window" with day pre-selected
Integration Points
Application Detail Enhancement
Add "SLA" tab showing:
- Full SLA details if exists
- Create SLA button if not exists
- Quick metrics summary in overview tab
Dashboard Enhancement (Optional)
Add widgets:
- Applications with/without SLAs count
- Active critical periods count
- Upcoming critical periods (next 7 days)
- Maintenance windows today
Performance Considerations
- Index on application_id for fast SLA lookup
- Index on critical period dates for active/upcoming queries
- Store diagram JSON in JSONB for efficient querying (PostgreSQL)
- Paginate SLA list if count grows large
- Optimize loading of related entities (maintenance windows, critical periods, contacts)
Diagram Integration Options
Option 1: Upload draw.io JSON
- User creates diagram in draw.io desktop app
- Exports as JSON
- Uploads JSON file in SLA form
- Backend stores JSON in JSONB column
- Frontend renders using draw.io viewer library
Option 2: Embedded draw.io Editor
- Integrate draw.io embed library
- User edits diagram directly in application
- Save diagram as JSON to backend
- Requires:
<iframe>or draw.io integration library
Option 3: External Link Only (Simplest MVP)
- Store only URL to diagram (in SharePoint, Confluence, etc.)
- Display link, open in new tab
- No rendering in LDPv2
Recommendation for MVP: Option 3 (external link), upgrade to Option 1 or 2 in Phase 2
Future Enhancements (Phase 3)
- Automated SLA compliance monitoring
- Real-time availability tracking
- Incident management integration
- Automated alerts during critical periods
- SLA breach notifications
- Historical compliance reports
- SLA comparison across applications
- Automated maintenance window scheduling
- Time zone support for global teams
- Integration with monitoring tools (Datadog, New Relic)
- SLA templates for quick creation
- Approval workflow for SLA changes
Story Status: Ready for Development
Estimated Completion: 6-8 days