Unit 10: Critical Evaluation

Learning Objectives

Critically evaluate SDLC decisions using evidence from real-world failure cases
Identify the root causes of software project failure across SDLC phases
Apply a risk management framework to software engineering decisions
Articulate the professional and ethical responsibilities of software engineers
Demonstrate integrative understanding of the full SDLC through structured reflection

Case Study Analysis: Lessons from SDLC Failure

Critical evaluation requires moving beyond description to explanation: not merely what happened, but why it happened, and what SDLC practice would have prevented it. Three cases recur throughout this course. This unit synthesises their lessons.

Therac-25 (1985–1987)

The Therac-25 was a radiation therapy machine that delivered fatal overdoses to at least six patients due to software defects. The root cause was not a single bug but a constellation of SDLC failures:

Requirements: Safety requirements were not specified with measurable thresholds; no formal hazard analysis was conducted
Architecture: The software was a direct port from the Therac-20, which had hardware interlocks that were removed in the Therac-25 — architectural risk not documented in an ADR
Testing: Race conditions in the UI control software were not covered by test cases; the test suite verified the happy path only
Quality assurance: Defect reports from operators were initially dismissed; no escape rate metric existed to flag systemic issues
Maintenance: Software updates were deployed without re-validation of safety-critical behaviour

The Therac-25 case established the principle that safety-critical systems require independent verification at each SDLC phase, with safety requirements treated as inviolable constraints rather than quality attributes to be traded off.

NHS NPfIT (2003–2011)

The National Programme for IT was the UK government's £12bn attempt to create a unified NHS IT infrastructure. It was abandoned in 2011 having delivered a fraction of its intended functionality.

Requirements: Centralised requirements specification ignored the heterogeneity of NHS trusts; clinicians — the primary users — were excluded from elicitation
Architecture: A monolithic national architecture was chosen for a problem that required federated solutions; the architectural decision was driven by procurement economics, not technical analysis
Methodologies: A waterfall model was applied to a requirements problem that was fundamentally undiscoverable upfront; no iterative validation with clinical users
Stakeholder management: The complexity of the stakeholder landscape (trusts, clinicians, patients, NHS England, commercial suppliers) was underestimated; requirements conflicts were unresolved at programme level

NPfIT illustrates that methodology selection is a strategic decision with consequences that cannot be corrected mid-programme at scale. It also demonstrates that technical excellence is insufficient when requirements are structurally flawed.

Knight Capital (2012)

Knight Capital lost $440m in 45 minutes due to a deployment failure. A new trading algorithm was deployed to only seven of eight servers; an unused code path containing deprecated logic was inadvertently activated on the eighth server.

DevOps: No automated deployment verification; the deployment was manual and partial
Testing: The deprecated code path was not covered by tests and had not been removed — preventive maintenance deferred
Monitoring: Alerts fired within minutes, but the incident response was too slow to interpret and act; observability was present but incident management was not
Risk: A single deployment step could cause unlimited financial loss — no circuit breaker, no kill switch, no position limit enforcement in the new code

Knight Capital demonstrates that technical risk management must be proportional to business exposure. The SDLC for a financial trading system demands deployment gates and kill switches that are not standard for a content management system.

Risk and Failure in Software Engineering

Software projects fail in recognisable patterns. Understanding these patterns enables proactive risk management rather than reactive crisis response.

Categories of Software Project Failure

Category	Typical manifestation	SDLC phase of origin
Requirements failure	System built correctly but solves wrong problem; users reject the system	Requirements engineering
Architectural failure	System cannot scale, cannot be maintained, or cannot evolve to meet new requirements	Architecture & Design
Estimation failure	Project runs significantly over time and budget; Brooks' Law exacerbates late additions	Planning / Methodologies
Communication failure	Misalignment between technical and non-technical stakeholders; decisions made without documentation	All phases
Testing failure	High coverage but high escape rate; defects reach production that testing should have caught	Quality Assurance
Deployment failure	Correct software fails in production due to environment, configuration, or data issues	DevOps / Deployment
Maintenance failure	System becomes progressively unmaintainable; change cost approaches replacement cost	Ongoing maintenance

Risk Management Framework

Risk management in software engineering follows a four-stage cycle:

Identification: systematically identify risks across all SDLC phases; use checklists, retrospectives, and threat modelling
Assessment: evaluate each risk by probability × impact; prioritise the high-probability/high-impact quadrant
Mitigation: select a strategy — avoid (don't take on the risky feature), transfer (insurance, SLA), mitigate (reduce probability or impact), accept (document and monitor)
Monitoring: risk profiles change as the project progresses; a risk that was low-probability in Week 2 may become high-probability in Week 10

Brooks' Law and Its Implications

"Adding manpower to a late software project makes it later." — Fred Brooks, The Mythical Man-Month (1975)

The reasoning is precise: new team members require onboarding time from existing members, reducing their productive output. Communication overhead grows as O(n²) with team size. Knowledge transfer is lossy — tacit knowledge about requirements, decisions, and codebase structure cannot be fully transferred.

Brooks' Law implies that schedule risk must be addressed early: by descoping, by changing methodology, or by accepting delay. Attempting to recover a late project by adding resources is almost always counterproductive.

The Wicked Problem Problem

Horst Rittel and Melvin Webber's concept of wicked problems applies directly to large software projects. A wicked problem is one where:

The problem cannot be fully understood before a solution is attempted
Every solution changes the problem
There is no definitive test of whether a solution is correct
Each attempt is consequential and not fully reversible

Enterprise software requirements are often wicked. This is the deep justification for iterative methodologies: not merely organisational preference, but an epistemological acknowledgement that requirements are discovered through the process of trying to satisfy them. Waterfall assumes requirements are knowable upfront — a premise that is valid for some engineering problems and invalid for many software problems.

Professional and Ethical Dimensions

Software engineering affects the lives of people who have no direct relationship with the engineers who built the systems they use. This creates professional and ethical responsibilities that go beyond technical competence.

The ACM/IEEE Code of Ethics (Eight Principles)

The ACM/IEEE Software Engineering Code of Ethics (1999) identifies eight principles. The first is primary:

"Software engineers shall act consistently with the public interest."

The remaining seven principles — Client, Product, Judgement, Management, Profession, Colleagues, Self — are subordinate to this. When a client's instruction conflicts with the public interest, the code is unambiguous: public interest prevails.

This has direct SDLC implications. A requirements engineer who accepts a specification that they believe will produce an unsafe system has an ethical obligation to raise this — formally, in writing, in a way that creates an audit trail.

Legal Obligations

Software engineers operate within a legal framework that varies by jurisdiction and domain. Recurrent legal obligations in enterprise software:

Data protection (UK GDPR / PECR): systems that process personal data must have a lawful basis; data minimisation, retention limits, and subject rights are legal requirements, not design preferences
Accessibility (Equality Act 2010, WCAG 2.2): public-facing systems must be accessible; inaccessible design is a legal liability
Contractual obligations: SLAs, uptime guarantees, and performance specifications are legally enforceable; NFRs documented in the requirements specification become contractual commitments
Product liability: in safety-critical domains, software defects that cause harm may create legal liability; the software engineer cannot disclaim this by citing employer instruction

Reflective Practice

Professional development in software engineering requires systematic reflection on practice — not merely learning new tools, but developing the capacity to evaluate one's own decisions against evidence.

The SDLC provides a natural structure for reflective practice: after each phase, ask not only "what did we produce?" but "what decisions did we make, on what evidence, with what consequences, and what would we do differently?" The blameless post-mortem culture from DevOps (Unit 7) is a formalisation of this reflective stance applied to incidents — it is equally applicable to architectural decisions, requirements choices, and methodology selection.

A software engineer who cannot articulate the reasoning behind their decisions — not just what they did but why — is one who learns slowly. The SDLC practices taught in this course are not just process requirements; they are knowledge management tools that make reasoning transparent and revisable.

Key Concepts: Failure and Risk

The Therac-25 demonstrates three things about safety-critical requirements:

Safety requirements must be measurable and testable. "The system shall not deliver an unsafe dose" is not testable. "The beam shall not activate unless the treatment head is confirmed in the correct position via two independent sensor readings" is.
Hazard analysis must be conducted independently of implementation. The Therac-25 team assumed the software was safe because the Therac-20 was safe — a category error that confused the safety properties of hardware interlocks with the safety properties of software logic.
User-reported anomalies are requirements feedback. Operators reported strange behaviour months before fatalities occurred. A defect escape rate metric that tracked unexplained system behaviour would have triggered investigation.

A complex problem is difficult but has a discoverable answer: the answer exists before the investigation begins. A wicked problem changes as you attempt to solve it: the act of specifying requirements changes what stakeholders want; building a prototype reveals requirements that could not be articulated without seeing them; deploying a system to users generates new requirements based on actual use.

NHS NPfIT illustrates this: the initial requirements were specified by central government, but deploying partial functionality revealed that clinical workflows varied radically between trusts, invalidating the centralised specification. No amount of upfront requirements work could have discovered this — it required deployment and feedback.

The implication is not that requirements engineering is impossible but that it must be treated as an ongoing activity, not a one-time phase.

Brooks' Law is often cited as a waterfall-specific problem, but it applies equally to Agile. Adding a developer to a Scrum team mid-sprint:

Requires onboarding time from senior team members (capacity loss)
Introduces communication overhead that grows with team size
Increases the cost of sprint ceremonies (planning, retrospectives, standups)
May require restructuring team composition and ownership boundaries

The Agile mitigation is not to add developers but to descope: reduce the sprint backlog, remove items from the release, or accept a delayed release. Brooks' Law counsels against responding to schedule risk with resource additions; the Agile principle of sustainable pace counsels the same.

Key Concepts: Ethics and Professionalism

The Code's primary principle — public interest — manifests in ordinary SDLC decisions:

A requirements engineer who accepts an NFR that is technically unachievable (and knows it) has misled the client
An architect who selects a technology stack they are unfamiliar with without disclosing this is not acting in the client's interest
A developer who ships code they know is not tested adequately, under schedule pressure, in a safety-critical context, has placed schedule above public interest
A tester who signs off a release knowing that key test cases were skipped has falsified evidence

The Code does not require perfection — it requires honesty, transparency, and the willingness to raise concerns formally when professional judgement conflicts with instruction.

UK GDPR creates specific requirements that must appear in the requirements specification and RTM:

Lawful basis: every personal data processing activity must have a documented lawful basis (consent, legitimate interest, contractual necessity, etc.). This is a functional requirement.
Data minimisation: only data necessary for the stated purpose may be collected. This constrains the data model — a requirements decision.
Retention limits: personal data must be deleted after the purpose expires. This requires a data lifecycle management function — an architectural decision.
Subject rights: right to access, right to erasure, right to portability. These are functional requirements with specific response time SLAs (measurable NFRs).

Teams that treat GDPR as a legal checkbox after development are generating corrective and adaptive maintenance work that could have been designed out in the requirements phase.

Check Your Understanding

In the Knight Capital case, the primary SDLC failure was best described as:

A testing failure — the deprecated code path had no test coverage A requirements failure — the kill switch was not specified as a requirement A DevOps failure — the deployment was manual, partial, and unverified A maintenance failure — deprecated code was not removed from the codebase

A software engineer is instructed by their client to disable the audit logging in a system handling medical records, to reduce storage costs. According to the ACM/IEEE Code of Ethics, the engineer should:

Comply — the client has the right to make decisions about their own system Formally decline and explain the legal and safety implications in writing Disable the logging but flag it in the release notes as a known risk Implement a reduced logging system as a compromise

AI Dimension

AI and the SDLC: an integrative assessment

Across this course, we have examined AI's role at each SDLC phase. The picture that emerges is consistent: AI tools offer genuine acceleration for well-defined subtasks while being structurally limited in activities that require contextual judgement, ethical reasoning, or accountability.

The four aspects revisited:

Amplifier: as a role, AI consistently amplifies what engineers already do well. Strong requirements engineers use AI to surface ambiguities and generate test cases faster. Strong architects use AI to evaluate pattern tradeoffs against a wider literature. The amplification effect is bounded by the quality of human judgement directing the tool.
Assistant: as a role, AI reduces friction on well-understood, high-volume tasks — boilerplate code generation, documentation drafting, test case generation from specifications. The economic value is real but concentrated in execution, not in the decisions that govern execution.
Risk: AI introduces novel failure modes that did not exist in pre-AI development: hallucinated requirements, confidently wrong architectural advice, generated code that passes tests but is insecure, threat models that appear comprehensive but miss organisational context. Each SDLC phase has a characteristic AI risk that must be actively managed.
Subject: AI systems are increasingly the artefacts that software engineers are asked to build. Building AI systems requires applying SDLC discipline to a domain where the failure modes (bias, hallucination, adversarial inputs) are poorly captured by traditional testing frameworks. Requirements for AI systems must include fairness, robustness, and explainability constraints alongside functional requirements.

What AI does not change: the fundamental obligations of software engineering — to understand the problem before designing the solution, to design before building, to test against requirements rather than implementation, to document decisions so they can be revisited, and to act in the public interest when client instruction conflicts with it. AI tools operate inside the SDLC; they do not substitute for it.

Group Activity: Design Decision Debate

Return to the Architectural Decision Record (ADR) your group produced in Unit 5 for the hospital appointment booking system. You chose between a monolithic layered architecture and a microservices architecture.

Revisit your choice in light of everything covered since Unit 5: implementation challenges (Unit 7), testing implications (Unit 8), deployment strategy (Unit 9), and maintenance trajectory (Unit 9).
Would you make the same decision now? If yes, what evidence from later units reinforces your choice? If no, what has changed your assessment?
Identify the single most important piece of information you did not have at Unit 5 that would have strengthened your ADR.
Write a one-paragraph "ADR retrospective" that documents what you learned and would include in the original ADR if written today.

This activity models reflective practice: the ability to evaluate past decisions against accumulated evidence. Professional software engineers do this after every project; good organisations institutionalise it in retrospectives and post-mortems.

Group Activity: Ethics Scenario Analysis

Each group analyses one scenario. Identify: (a) which ACM/IEEE principles are engaged; (b) what the engineer's obligation is; (c) what SDLC artefact would provide the clearest evidence trail.

A lead developer discovers a data validation defect three days before a major release of a healthcare appointment booking system. The defect allows a malformed date input to silently overwrite appointment records. Fixing it requires four hours of development and full regression testing. The product manager says: "We'll fix it in the next sprint — it's not on the critical path."

Who bears ethical responsibility? What should the developer do? What SDLC process failure allowed this situation to arise?

A software engineering team provides a three-month estimate for a client project. The project manager adjusts this to two months in the proposal, believing the team will work faster under pressure and that the client will accept nothing longer. The team is not told their estimate was changed. Midway through the project, it is clear the original estimate was correct.

Which principles does the project manager's action violate? What should individual team members do when they discover the discrepancy? How does this relate to the MSc principle of evidence-based estimation?

A public-sector client asks the development team to deliver the minimum viable product without WCAG 2.2 accessibility compliance, intending to "add it later." The team knows that accessibility retrofitting is 10× more expensive than designing for it upfront, and that a significant proportion of the target users have visual impairments. The client has a legal obligation under the Equality Act 2010.

What is the engineer's obligation when the client's instruction is both legally non-compliant and likely to cause harm? What documentation should be created?

Question Bank

A revision and assessment resource covering all ten units. Questions are grouped by type. Use these for self-testing, peer quizzing, or formative assessment preparation.

Section A — Define the Term

Software engineering is the systematic application of engineering principles, methods, and tools to the development and maintenance of high-quality software systems, encompassing requirements, design, implementation, testing, deployment, and maintenance.

A non-functional requirement specifies a quality attribute of the system (how it must perform) rather than a specific behaviour (what it must do). Well-formed NFRs specify attribute + metric + threshold, e.g. "The system shall respond to 95% of booking requests within 2 seconds under a load of 1,000 concurrent users."

Technical debt is the implied future rework cost incurred when a faster but less robust solution is chosen in the present. Like financial debt, it accrues interest — the longer it is deferred, the more work is required to resolve it. It may be deliberate (explicit trade-off) or inadvertent (poor practice not recognised at the time).

A Requirements Traceability Matrix is a document that maps each requirement to the design elements, code modules, and test cases that implement and verify it. It demonstrates that every requirement has been addressed and provides impact analysis when requirements change.

An Architectural Decision Record is a document that captures a significant architectural decision: the context that made it necessary, the options considered, the decision made, the rationale, and the expected consequences. ADRs provide an auditable record of why a system is designed the way it is.

Continuous Integration is a DevOps practice where developers frequently merge code changes (at least daily) into a shared repository, triggering an automated build and test pipeline. The goal is to detect integration conflicts and defects as quickly as possible, reducing the cost of resolution.

Acceptance testing verifies that a system meets stakeholder requirements and is fit for purpose from the user's perspective. It is typically the final testing phase before deployment and may be performed by the client, end users, or an independent testing team. BDD (Behaviour-Driven Development) formalises acceptance criteria as Given/When/Then specifications.

Canary deployment is a release strategy that routes a small, controlled percentage of live traffic to a new version of the software while the majority continues to the previous version. Metrics are monitored; if the new version performs well, the traffic percentage is progressively increased to 100%. If problems are detected, traffic is redirected back instantly.

Threat modelling is a structured technique for identifying security vulnerabilities in a system during the design phase. STRIDE is the most widely used framework: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege. Each category identifies a class of attack that the design must address.

Preventive maintenance involves proactively improving software to prevent future failures: refactoring to reduce complexity, updating dependencies before they become security liabilities, improving test coverage, and addressing technical debt. It is the maintenance type most commonly deferred in favour of feature development, and its deferral is the primary cause of legacy system crises.

Cyclomatic complexity is a quantitative measure of the number of independent paths through a piece of code. It is calculated as: E − N + 2P, where E = edges, N = nodes, P = connected components in the control flow graph. Higher complexity indicates harder-to-test, harder-to-maintain code. A common threshold for a single function is ≤ 10.

In UML use case modelling, «extend» represents optional or conditional behaviour: the extending use case adds behaviour to the base use case under specific conditions, specified at an extension point. The arrow points from the extending use case to the base. Example: "Add to Waiting List" extends "Book Appointment" when no slots are available.

Section B — Identify the Term

Integration defect (specifically, an interface mismatch or contract violation). This is a classic example of what integration testing is designed to detect. The defect originates in the requirements or design phase — either the interface contract was not specified, or modules were implemented without referencing it. In testing terms, unit tests for each module passed because they tested in isolation; only integration testing reveals the incompatibility.

Ambiguous requirement. "User-friendly" is unmeasurable — there is no test that can verify or falsify it. It should be replaced with a measurable NFR: e.g., "New users shall be able to complete a booking without assistance within 5 minutes, measured by usability testing with a representative sample of 10 users." The original statement may be retained as a stakeholder goal in a use case description, but it cannot serve as a testable requirement.

Technical debt — specifically, deliberate technical debt that became inadvertent legacy debt. The team made an explicit trade-off (deliberate debt), which is defensible if tracked and resolved. The failure to track and prioritise resolution converted it to unmanaged legacy debt. This is the characteristic pattern through which deliberate tactical shortcuts accumulate into systemic architectural problems.

Coverage is necessary but not sufficient. This scenario illustrates that coverage measures which lines are executed by tests, not whether the tests verify correct behaviour. Tests may execute code with assertions that do not check meaningful outcomes, or may not cover edge cases that the code reaches in production. The finding also suggests tests are verifying implementation rather than requirements — a test design failure rather than a coverage failure.

Blameless post-mortem (also: blameless retrospective). This is a core DevOps cultural practice, associated with the Site Reliability Engineering (SRE) discipline developed at Google. The premise is that individuals operate within systems; defects are caused by system design, not individual error. Blame-focused post-mortems produce defensive behaviour and suppress the honest reporting needed to understand and improve the system.

Strangler Fig pattern. Named after the strangler fig vine that gradually envelops and replaces its host tree. A routing layer (often an API gateway or reverse proxy) intercepts all requests and forwards them to either the legacy system or the new services depending on which capabilities have been migrated. The pattern bounds risk at each migration step and avoids the need for a "big bang" migration where all functionality must be replicated before any transition can occur.

Brooks' Law: "Adding manpower to a late software project makes it later." The new developers require onboarding time from existing team members (reducing their output), increase communication overhead (which grows as O(n²) with team size), and lack the tacit knowledge of existing team members. The correct response to schedule pressure is to descope, change methodology, or accept delay — not to add resources.

Wicked problem (Rittel and Webber). The requirements could not be fully specified upfront because the users could not articulate what they needed without seeing attempts to satisfy their needs. This is the deep justification for iterative and user-centred methodologies in enterprise software: requirements are discovered through the process of trying to satisfy them, not enumerated in advance. The correct methodological response is rapid prototyping and iterative validation, not a heavier requirements engineering effort.

Section C — Explain the Difference

Functional requirements describe what the system must do — specific behaviours or functions. Non-functional requirements describe how the system must perform — quality attributes that apply across functions.

FR example: "The system shall allow a GP to view a patient's appointment history for the preceding 12 months." This specifies a behaviour.

NFR example: "The system shall return appointment history queries within 1.5 seconds for 99% of requests under normal load (up to 500 concurrent users)." This specifies a performance constraint on the behaviour.

The distinction matters for architecture: NFRs such as performance, availability, and security often drive architectural decisions more profoundly than any single functional requirement.

Waterfall is a sequential, phase-gated methodology: requirements are fully specified before design begins; design is complete before implementation begins; each phase produces a baseline artefact that constrains the next. It assumes requirements are knowable upfront and stable. Changes after baseline are costly.

Scrum is an iterative Agile framework operating in fixed-length sprints (typically 2 weeks). Requirements (user stories) are maintained in a prioritised backlog; scope and priority are adjusted between sprints. It assumes requirements will evolve and embraces change.

Waterfall is more appropriate for: safety-critical embedded systems with stable, formally verified requirements (e.g., flight control software); contractually fixed-scope projects with clear acceptance criteria; systems where iterations would be prohibitively expensive (nuclear plant control systems).

Scrum is more appropriate for: enterprise software with evolving user requirements; products where user feedback drives development direction; teams with direct access to stakeholders; digital services expected to evolve post-launch.

Both are use case relationships in UML, but they represent different dependency structures.

«include» represents mandatory shared behaviour: the base use case always invokes the included use case. Arrow points from base to included. Use when behaviour must be reused across multiple use cases to avoid duplication. Example: "Book Appointment" and "Cancel Appointment" both «include» "Authenticate User."

«extend» represents optional or conditional behaviour: the extending use case adds to the base under specific conditions, at a defined extension point. Arrow points from extending to base. Use when behaviour only occurs in certain scenarios. Example: "Add to Waiting List" «extends» "Book Appointment" when no slots are available.

A common mistake: reversing the «extend» arrow. The arrow runs from the optional behaviour to the core behaviour (the extending use case depends on the base, not the other way around).

SAST (Static Application Security Testing) analyses source code or compiled binaries without executing them. It detects vulnerability patterns (SQL injection sinks, unsafe deserialization, hardcoded credentials) by inspecting code structure. It runs early in the pipeline — on commit or pull request. It cannot detect vulnerabilities that only appear in a running system.

DAST (Dynamic Application Security Testing) sends attack probes (malformed inputs, injection attempts, authentication bypass requests) to a running application and analyses responses. It detects runtime vulnerabilities that SAST misses (race conditions, authentication logic flaws, server configuration issues). It requires a deployed test environment.

SAST finds how the code is written; DAST finds how the running system behaves. Both are needed: SAST fails to catch environment-dependent vulnerabilities; DAST cannot analyse code paths that are not triggered by its probes.

Corrective maintenance is reactive: it fixes defects discovered in a production system. It is triggered by failure and carries urgency.

Preventive maintenance is proactive: it improves the system before problems occur — refactoring, updating dependencies, improving tests, resolving technical debt. It is triggered by professional judgement, not immediate failure.

Why preventive maintenance is underprioritised: it has no visible immediate cost of neglect. A failing system demands attention; a healthy system with accumulating technical debt does not. Product stakeholders who control the backlog prioritise user-visible features (perfective maintenance) and urgent defects (corrective maintenance) over improvements whose benefit is the absence of future problems. The cost of this prioritisation is paid later, in accelerating maintenance effort and eventual legacy system crises. Organisations that budget sprint capacity explicitly for preventive maintenance — treating it as a first-class cost category — consistently avoid the crisis pattern.

Monitoring is the practice of watching known, predefined signals: dashboards of specific metrics (CPU, error rate), alerts triggered by threshold breaches. It answers the question: "Is this specific thing wrong?"

Observability is a system property: the degree to which the internal state of a system can be inferred from its external outputs (logs, metrics, traces). An observable system allows engineers to answer questions they did not anticipate when they set up monitoring: "Why is this specific user's request slow?" or "What was the state of the database connection pool at the moment this error occurred?"

Monitoring is necessary but not sufficient for production operations. A system can be heavily monitored but poorly observable — you know something is wrong (alert fires) but cannot determine what or why (no traces, unstructured logs). Observability supports diagnosis; monitoring supports alerting. Both are required.

Coupling measures the degree of interdependence between modules. High coupling means changes to one module require changes to others; the system is fragile. Low coupling means modules can change independently; the system is maintainable.

Cohesion measures how strongly related the responsibilities within a single module are. High cohesion means a module does one thing well; it is understandable and testable. Low cohesion (a "God class" or utility module) means a module has unrelated responsibilities; it is difficult to test in isolation and attracts further unrelated additions.

The design goal is high cohesion and low coupling. These properties are related: modules with high cohesion have clear, bounded interfaces, which naturally reduces coupling to other modules. Both are measurable — coupling through dependency analysis tools, cohesion through LCOM (Lack of Cohesion in Methods) metrics.

A security requirement specifies what the system must do (or not do) with respect to security, at a level that is technology-independent and testable: "All data in transit between client and server shall be encrypted using TLS 1.3 or later." It belongs in the requirements specification and RTM.

A security control is the implementation mechanism that satisfies the requirement: TLS certificate configuration, enforced HTTPS redirects, HSTS headers. It belongs in the design and implementation documentation.

The distinction matters because: (1) requirements must be defined before controls are selected — choosing a control before the requirement is understood risks solving the wrong problem; (2) multiple controls may satisfy the same requirement — the requirement creates the test, and the control must pass it; (3) security audits test compliance with requirements, not the presence of specific controls.

Review

This final unit synthesised the SDLC through the lens of critical evaluation — examining not what the SDLC recommends but why those recommendations exist, what happens when they are violated, and what obligations they create. Case studies demonstrate that software failures are not random events but the predictable outcome of identifiable SDLC failures: uncaught safety requirements, wrong methodology for a wicked problem, unverified deployment. Forensic analysis of these cases is more valuable than additional prescriptive process guidance.

Risk management, professional ethics, and legal obligation are not separate from software engineering — they are constitutive of it. The ACM/IEEE Code's primary principle (public interest) is not an abstract aspiration but a practical guide: when instructions conflict with it, the engineer has an obligation that transcends employment. Legal frameworks like UK GDPR translate that obligation into concrete, testable requirements.

The course leaves you with the hospital appointment booking system as a running artefact — requirements in the RTM, architectural decisions in ADRs, test strategy defined, deployment plan designed, ethics scenarios interrogated. A software system is not finished when it is deployed; it is a living artefact that reflects the decisions made throughout its SDLC. Those decisions are your professional responsibility.

Learning Objectives

Case Study Analysis: Lessons from SDLC Failure

Therac-25 (1985–1987)

NHS NPfIT (2003–2011)

Knight Capital (2012)

Risk and Failure in Software Engineering

Categories of Software Project Failure

Risk Management Framework

Brooks' Law and Its Implications

The Wicked Problem Problem

Professional and Ethical Dimensions

The ACM/IEEE Code of Ethics (Eight Principles)

Legal Obligations

Reflective Practice

Key Concepts: Failure and Risk

What does the Therac-25 case demonstrate about safety-critical requirements?

Why is 'wicked problem' a more accurate description of enterprise requirements than 'complex problem'?

How does Brooks' Law apply to agile as well as waterfall projects?

Key Concepts: Ethics and Professionalism

What is the relationship between the ACM/IEEE Code of Ethics and day-to-day SDLC decisions?

How does GDPR/UK GDPR create requirements engineering obligations?

Check Your Understanding

Group Activity: Design Decision Debate

Group Activity: Ethics Scenario Analysis

Scenario A: The Known Defect

Scenario B: The Optimistic Estimate

Scenario C: The Accessibility Gap

Question Bank

Section A — Define the Term

Define: Software Engineering

Define: Non-Functional Requirement (NFR)

Define: Technical Debt

Define: Requirements Traceability Matrix (RTM)

Define: Architectural Decision Record (ADR)

Define: Continuous Integration (CI)

Define: Acceptance Testing

Define: Canary Deployment

Define: Threat Modelling

Define: Preventive Maintenance

Define: Cyclomatic Complexity

Define: «extend» relationship (use case modelling)

Section B — Identify the Term

A development team discovers during testing that two separately developed modules cannot communicate because they make different assumptions about the data format. What type of defect is this?

A requirement states: 'The system should be user-friendly.' What type of problem does this represent, and how should it be resolved?

A development team agrees to use a simple but incorrect approach to authentication 'for now' and document it as needing replacement before release. The replacement never happens and is eventually forgotten. What concept does this illustrate?

A software team has 95% code coverage but a post-release analysis shows 40% of production defects were in modules with >90% coverage. What concept does this illustrate?

After a production incident, the team holds a meeting that explicitly prohibits assigning blame to individuals and focuses on understanding how the system's design enabled the failure. What is this called?

A legacy system is gradually replaced by routing specific capabilities to new microservices one at a time, while the legacy system continues to handle unmigrated functionality. What pattern is this?

A development manager responds to a project being two weeks behind schedule by adding three new developers to the team. Six weeks later, the project is further behind. What principle explains this?

A software team is asked to build a system for a new medical device. After months of requirements engineering, the clinical users say the specification does not reflect how they actually work. What type of problem does this represent?

Section C — Explain the Difference

Explain the difference between Functional Requirements and Non-Functional Requirements, with one example of each from a healthcare system.

Explain the difference between Waterfall and Scrum, and identify one context where each is more appropriate.

Explain the difference between «include» and «extend» in use case modelling.

Explain the difference between SAST and DAST in security testing.

Explain the difference between corrective and preventive maintenance, and explain why organisations tend to underprioritise preventive maintenance.

Explain the difference between observability and monitoring.

Explain the difference between coupling and cohesion in software design.

Explain the difference between a security requirement and a security control.

Review