Unit 9: Deployment & Security
Learning Objectives
- Compare deployment strategies (rolling, blue-green, canary) and select appropriately
- Explain security-by-design and the cost of retrofitting security
- Analyse the four types of software maintenance and their SDLC implications
- Evaluate the role of observability in production systems
- Apply threat modelling principles to the running case study
Deployment Strategies
How software reaches production is an engineering decision with direct consequences for availability, risk, and rollback capability. The right strategy depends on system criticality, team maturity, and infrastructure.
Why Deployment Strategy Matters
A deployment is not the end of development — it is the transition from a controlled environment to a live one. Production systems have real users, real data, and real consequences when they fail. The three core deployment strategies represent different points on the speed vs. risk spectrum.
| Strategy | How it works | Rollback ease | Risk | Best for |
|---|---|---|---|---|
| Big Bang | Replace everything at once | Difficult — restore from backup | Highest | Small systems, offline deployments |
| Rolling | Replace instances progressively; old + new run simultaneously for a period | Stop the rollout; harder if schema changed | Medium | Stateless services, general web apps |
| Blue-Green | Run two identical environments; switch traffic from Blue (live) to Green (new) instantly | Instant — flip traffic back to Blue | Low | Critical systems where downtime is unacceptable |
| Canary | Route a small % of traffic to the new version; expand if metrics are healthy | Redirect 100% back to old version | Very low | High-traffic systems, major feature releases |
| Feature Flags | Deploy code but control activation via configuration; can target specific users | Toggle off instantly | Very low | A/B testing, gradual feature rollout |
Zero-Downtime Deployment
Most production deployments aim for zero downtime. Achieving this requires more than choosing a strategy — it requires designing for it:
- Database migrations must be backward-compatible while old instances still run. Adding a nullable column is safe; renaming a column is not.
- API versioning ensures new clients and old clients can coexist during transition periods.
- Health checks tell the load balancer whether an instance is ready to receive traffic.
- Graceful shutdown lets instances finish in-flight requests before terminating.
Observability: Knowing What Is Happening
Once deployed, a system needs to be understood from the outside. Observability is the degree to which internal states can be inferred from external outputs. It rests on three pillars:
Logs
Timestamped records of events. Essential for post-incident diagnosis. Require structured format (JSON) for automated querying. Storage cost grows fast at scale.
Metrics
Numerical time-series data: CPU, memory, request rate, error rate, latency percentiles. Dashboard visualisation; alerting. Cheap to store relative to logs.
Traces
End-to-end records of a single request across services. Reveals latency hotspots in distributed systems. Requires instrumentation of every service boundary.
A system without observability is a black box in production. The CI/CD pipeline guarantees correct deployment; observability tells you whether the deployed system is behaving correctly.
Security-Aware Engineering
Security is not a phase of the SDLC — it is a property that must be designed in from the beginning. Security-by-design is consistently cheaper and more effective than retrofitting security after deployment.
The Cost Curve of Late Security
The same principle that applies to defect cost applies to security vulnerabilities: the later a security flaw is found, the more expensive it is to fix. A design flaw identified in architecture review may require minutes to correct. The same flaw discovered after deployment may require emergency patching, user notification, regulatory reporting, and potential litigation.
Key principle: threat modelling should happen during design, not as a pre-release security audit.
Threat Modelling
Threat modelling is a structured technique for identifying security vulnerabilities before writing code. The most widely used framework is STRIDE:
| Threat | What it means | Example (hospital system) |
|---|---|---|
| Spoofing | Claiming a false identity | Attacker logs in as a GP to view patient records |
| Tampering | Modifying data without authorisation | Appointment time altered in transit |
| Repudiation | Denying an action was performed | Consultant denies cancelling appointment; no audit log |
| Information Disclosure | Exposing data to unauthorised parties | Patient list visible without authentication |
| Denial of Service | Overwhelming the system to prevent legitimate access | Flood of booking requests prevents patients accessing portal |
| Elevation of Privilege | Gaining access beyond authorisation | Patient-level user accesses admin functions |
Security Requirements vs Security Controls
A security requirement specifies what the system must do (or not do): "All patient data in transit must be encrypted using TLS 1.3 or later."
A security control is the implementation mechanism: TLS certificates, enforced HTTPS redirects, HSTS headers.
Security requirements belong in the RTM alongside functional requirements. They are testable, verifiable, and traceable to design decisions. Teams that treat security as a checklist at the end of the project invariably miss this connection.
Common Vulnerability Categories (OWASP Top 10)
The OWASP Top 10 is the industry-standard reference for web application security risks. Recurring categories relevant to enterprise systems:
- Broken Access Control — users can perform actions beyond their permissions
- Injection (SQL, command) — untrusted input executed as code
- Insecure Design — security not considered in architecture; threat model absent
- Security Misconfiguration — default credentials, open ports, verbose error messages in production
- Vulnerable Components — third-party libraries with known CVEs
- Insufficient Logging & Monitoring — breaches not detected; STRIDE Repudiation exploited
Notice that Insecure Design appears in the list — a recognition that vulnerability is often architectural, not just implementation-level.
Security in the CI/CD Pipeline
Modern DevSecOps integrates security checks into the pipeline so that vulnerabilities are caught automatically:
- SAST (Static Application Security Testing) — scans source code for known vulnerability patterns (e.g. SQL injection sinks)
- SCA (Software Composition Analysis) — checks third-party dependencies against CVE databases
- DAST (Dynamic Application Security Testing) — sends attack probes to a running test instance
- Secrets scanning — detects API keys and credentials accidentally committed to version control
Each of these is a gate in the pipeline. A SAST failure should fail the build, not produce a warning. Security theatre — running scans but not acting on results — is worse than no scanning, as it creates false confidence.
Maintenance & Evolution
In most organisations, maintenance consumes 60–80% of total software lifetime costs. Yet it is rarely covered in depth in development-focused curricula. A software engineer who understands only greenfield development is poorly equipped for professional practice.
The Four Types of Maintenance (Lientz & Swanson)
Corrective
Fixing defects discovered in production. Reactive by nature. Represents a failure of earlier SDLC phases — defects that escaped testing. High urgency, often high cost.
Adaptive
Modifying the system to accommodate environment changes: OS upgrades, API changes, regulatory updates, new hardware. Predictable if change is monitored; disruptive if deferred.
Perfective
Adding new capabilities or improving existing ones in response to user requests. Continuous delivery model treats this as ongoing development. Most common type by volume.
Preventive
Refactoring, updating dependencies, improving test coverage before problems occur. Pays down technical debt. Often deprioritised in favour of new features; the source of legacy system crises.
Legacy Systems and the Maintainability Debt
A legacy system is one that is difficult to change but too risky to replace. This situation arises when:
- Preventive maintenance was consistently deprioritised, accumulating structural debt
- Original developers have left; knowledge exists only in the code itself (often undocumented)
- The system is deeply embedded in organisational processes — it is not just software but institutional memory
- Test coverage is low or absent, making any change potentially catastrophic
The NHS NPfIT (National Programme for IT) offers a cautionary example at scale: a system so large and complex that replacement attempts repeatedly failed, leaving organisations dependent on technology that could not be evolved.
Strategies for Legacy System Evolution
- Strangler Fig pattern: gradually replace the legacy system by routing functionality to new microservices, one capability at a time, until the old system can be retired
- Anti-corruption layer: place a translation layer between the legacy system and new components so that the old data model does not infect new design
- Characterisation testing: write tests that document current (possibly unexpected) behaviour before refactoring — the tests capture what the system does rather than what it should do
The SDLC Does Not End at Deployment
The waterfall model implied a clean handover from development to operations. Modern software engineering rejects this. The SDLC for a long-lived system is better understood as a continuous loop: monitor → analyse → plan → implement → test → deploy → monitor. The distinction between "development" and "maintenance" is largely artificial — both require requirements analysis, design decisions, testing, and deployment. The same engineering discipline applies throughout.
Key Concepts: Deployment and Observability
Blue-green deployment maintains two complete, identical environments. At any moment one is live (blue) and one is idle. A new release is deployed to the idle environment; after validation, traffic is switched instantly. Rollback is equally instant — switch back. The cost is maintaining double the infrastructure.
Canary deployment routes a small percentage of real traffic to the new version while the majority continues to the old version. If metrics remain healthy, the percentage increases progressively to 100%. Canary is better for detecting user-facing problems that only manifest at scale; blue-green is better for systems where any production exposure of a faulty release is unacceptable.
During a rolling or blue-green deployment, multiple versions of application code may run simultaneously — old code connecting to the database before traffic is fully switched. If a migration renames or removes a column, the old code breaks immediately.
The solution is expand-contract migrations: first add the new column alongside the old (expand phase); deploy the new code to read/write both; then remove the old column once no old code references it (contract phase). This makes schema changes backward-compatible and reversible.
Observability is the ability to infer the internal state of a system from its external outputs (logs, metrics, traces). It is a design concern because it must be built into the system, not bolted on later. Retroactively adding structured logging to a system with ad hoc print statements, or adding distributed tracing to a system with no instrumentation, is expensive and disruptive.
Practically: observability requirements — what must be logged, at what granularity, with what retention — should appear in the requirements specification and the RTM, traceable through design to implementation.
Key Concepts: Security and Maintenance
Security auditing — reviewing a completed system for vulnerabilities — can identify implementation flaws (SQL injection sinks, weak encryption) but cannot easily identify design flaws: missing authentication on an entire subsystem, absence of audit logging, an architecture that cannot enforce data segregation. Design flaws require architectural rework, which is far more costly at deployment than at the design stage.
Security-by-design incorporates threat modelling during architecture and treats security requirements as first-class requirements, traceable and testable alongside functional requirements. The STRIDE framework applied during design identifies categories of threat before a single line of code is written.
The four types of maintenance reveal that most software work is not greenfield development. An SDLC practitioner must account for:
- Corrective: emphasises the importance of testing investment — defects not caught in QA become expensive corrective maintenance
- Adaptive: highlights why architecture should isolate external dependencies (third-party APIs, OS interfaces) behind abstraction layers
- Perfective: justifies continuous delivery practices — new features arrive as evolutionary maintenance, not discrete releases
- Preventive: makes the case for technical debt budgeting — allocating sprint capacity to refactoring and dependency updates before they become crises
Direct replacement ("big bang" migration) of a legacy system is high-risk: the new system must replicate all functionality before the old system can be retired, but the full functionality of a legacy system is often undocumented and discoverable only through use.
The Strangler Fig pattern reduces risk by migrating one capability at a time. A routing layer intercepts requests; migrated capabilities are handled by new services, unmigrated capabilities by the legacy system. At any point, the legacy system handles requests for un-migrated features — there is no moment when all functionality must simultaneously be present in the new system. Risk is bounded at each migration step. The pattern is named after a vine that gradually envelops and eventually replaces its host tree.
Check Your Understanding
During a blue-green deployment of the hospital booking system, a critical bug is found in the new (Green) version after 5% of users have been switched. What is the fastest recovery action?
A team discovers that a third-party payment library in the hospital system has a known SQL injection vulnerability (CVE published 6 months ago). Which pipeline gate should have caught this?
AI in deployment and security is one of the most commercially active areas of software tooling, with genuine capability and significant limitations operating in the same space.
Where AI adds genuine value:
- Anomaly detection: ML models trained on normal system behaviour can identify deviations that rule-based alerting misses — unusual access patterns, latency spikes at unexpected times, combinations of events that individually appear normal but collectively indicate compromise
- Automated SAST/SCA triage: AI assistants can explain vulnerability findings, suggest remediation, and prioritise by exploitability rather than just severity score
- Incident summarisation: during a live incident, AI tools can rapidly summarise log streams and identify correlated events, accelerating mean time to resolution (MTTR)
- Dependency update automation: tools like Dependabot use AI-assisted analysis to generate pull requests for dependency updates, with confidence scores on whether tests will pass
Where caution is warranted:
- False positives in security scanning: AI-generated SAST findings frequently include false positives. A team that auto-fails builds on AI security findings without human review will either drown in noise or disable the gate
- False negatives in threat modelling: AI threat modelling tools generate comprehensive-looking STRIDE analyses but lack organisational context. A hospital system's threat model must reflect NHS Spine integration, CQC regulatory exposure, and clinical workflow constraints — none of which the AI knows without explicit briefing
- Security of AI-generated code: as covered in Unit 7, AI assistants generate code that passes tests but may contain subtle security flaws. The OWASP finding that Insecure Design is a top risk applies acutely to AI-generated code, which may be structurally sound but architecturally insecure
The practical rule: use AI to accelerate the detection and diagnosis of known vulnerability patterns; always apply human judgement to architectural security decisions and to threat models for systems with regulatory or safety implications.
Group Activity: Deployment Plan & Security Review
Your group is preparing to deploy the hospital appointment booking system to production for the first time, following 12 weeks of development and testing. The system handles patient-identifiable data and must comply with NHS data security standards.
Part A — Deployment strategy (25 min)
- Select a deployment strategy for the initial release. Justify your choice with reference to system criticality, data sensitivity, and rollback requirements.
- Identify two database migration steps required (e.g., creating tables, seeding reference data). Specify how you will make these backward-compatible with the existing test environment.
- Define your minimum viable observability: which three metrics and two log event types must be in place before go-live?
Part B — Security review (20 min)
- Apply STRIDE to the patient booking portal login flow. For each threat category, identify one specific threat and one countermeasure.
- Identify which pipeline gates (SAST, SCA, DAST, secrets scanning) you would implement and at what pipeline stage each should run.
- Add two security requirements to your RTM. These should be measurable (attribute + metric + threshold), have a test case ID, and map to an architectural component.
Part C — Post-deployment plan (10 min)
Three weeks after go-live, a consultant reports that cancelled appointments are not always removed from the patient's view. Classify this as one of the four maintenance types, estimate the SDLC phase where the defect originated, and describe your response plan.
Review
This unit examined the final stages of the SDLC where engineering decisions have immediate operational and security consequences. Deployment strategy selection is a risk-management decision: blue-green minimises production exposure; canary validates behaviour at scale; rolling is pragmatic for stateless services. Zero-downtime deployment requires coordinated design across code, schema, and infrastructure — it cannot be achieved by deployment tooling alone.
Security-by-design is not a slogan but an engineering practice: threat modelling during architecture, security requirements in the RTM, automated gates in the CI/CD pipeline, and observability designed to detect the Repudiation and Information Disclosure threats that STRIDE identifies. Security left to a pre-release audit catches only implementation-level vulnerabilities; design-level vulnerabilities require design-stage discovery.
The four maintenance types reframe the SDLC as a continuous loop rather than a one-time delivery process. Preventive maintenance — the most easily deferred — is the one most responsible for the creation of legacy systems. Systems designed for maintainability (modular, well-documented, well-tested, with external dependencies isolated) reduce the long-term cost of all four maintenance types.