Unit 8: Software Testing, Reliability, and Quality Assurance

Learning Objectives

Distinguish the four levels of testing and explain what each verifies
Design a testing strategy that matches test types to risk profile
Explain what test automation adds to a CI/CD pipeline and where its limits lie
Apply quality metrics to assess the adequacy of a test suite

Core Input

Read through each tab before working through the key concepts.

Software testing is organised into levels that correspond to different scales of system integration and different categories of defect. The testing pyramid — coined by Mike Cohn — captures the principle that lower-level tests should be more numerous, faster, and cheaper than higher-level tests.

Unit testing — verifies the behaviour of the smallest testable units of code (functions, methods, classes) in isolation, with dependencies replaced by test doubles (mocks, stubs). Fast to run; pinpoints failures precisely. A good unit test is: independent, repeatable, focused on one behaviour, and readable as documentation.
Integration testing — verifies that components work correctly together with real or near-real dependencies. Catches defects at component boundaries that unit tests cannot detect — incorrect API contracts, database schema mismatches, network communication failures. Slower and more complex to set up than unit tests.
System testing — verifies the complete, integrated system against its requirements. End-to-end tests simulate real user journeys through the full technology stack. Catches defects that only emerge when all components are combined; slow, brittle, and expensive to maintain.
Acceptance testing — verifies that the system meets the stakeholders' needs. Performed against acceptance criteria defined in the requirements; often involves real users or their representatives. The final verification before release. Automated acceptance tests (e.g. using Cucumber or SpecFlow) connect requirements to verifiable scenarios.

Test automation is the practice of encoding test cases as code that can be executed automatically, producing a pass/fail result without human intervention. Automated tests are essential for CI/CD: a pipeline that requires manual testing at every stage cannot deliver continuously.

Where automation adds the most value:

Regression testing — verifying that existing behaviour has not been broken by a change
Repeated execution — the same test suite runs on every commit, across all branches
Parallel execution — large test suites can be split across multiple runners
Environments that are difficult to test manually at speed (performance, security, load)

Where automation has limits:

Usability testing — whether the system is easy to use requires human judgment, not scripted assertions
Exploratory testing — a skilled tester exploring a system to find unexpected defects cannot be automated
Visual testing — pixel-perfect rendering differences are difficult for automated tools to assess meaningfully
New behaviour — automated tests can only verify what was specified; they cannot discover that a requirement was never specified

Test-driven development (TDD) inverts the usual order: tests are written before the implementation they verify. The discipline ensures every function has at least one test; it also drives better design, as code that is hard to test usually reflects poor separation of concerns.

Quality assurance (QA) is not the same as testing. Testing detects defects; QA is the broader discipline of ensuring that processes produce quality throughout — preventing defects as well as detecting them.

Key quality metrics:

Test coverage — the percentage of code exercised by the test suite. A useful indicator but not a quality guarantee: 100% coverage can coexist with a test suite that makes no meaningful assertions. Coverage is a necessary but not sufficient condition.
Defect density — the number of defects per unit of code (e.g. per thousand lines). Tracks quality trends over time; a rising defect density signals declining code quality.
Defect escape rate — the proportion of defects that reach production rather than being caught before release. A high escape rate indicates inadequate pre-release testing.
Mean Time to Failure (MTTF) — the average time a system operates correctly before failing. A reliability metric.
Mean Time to Recovery (MTTR) — the average time to restore service after a failure. A resilience metric. A system with a low MTTF but a very low MTTR may be more acceptable in practice than one with a higher MTTF but very slow recovery.

Key Concepts: Testing Levels

Each level of testing answers a different question about the system. Work through these items to understand the distinctions.

A well-designed unit test has the following properties:

Independent — it does not depend on the state left by other tests. Each test sets up its own context and tears it down.
Fast — runs in milliseconds. A test suite of thousands of tests must run quickly to be useful in a CI pipeline.
Focused — tests one behaviour. A test that tests multiple things in sequence produces ambiguous failure messages.
Readable as documentation — a good unit test's name and structure describe what the function does and under what conditions. test_booking_fails_when_slot_already_taken() is documentation as well as a test.
Deterministic — runs the same result every time, regardless of time, environment, or execution order.

A test that relies on a specific database state, makes a real HTTP call, or depends on the current time is not a unit test — it is an integration test in disguise. These characteristics make it unreliable and slow.

Unit tests verify components in isolation. Integration tests verify that components work correctly when assembled. The category of defects that only integration tests can catch:

API contract violations — service A calls service B with a parameter format that service B does not accept. Both pass their unit tests; the integration fails.
Database schema mismatches — the application code assumes a column exists that was renamed in a migration. Unit tests mock the database; integration tests use a real one.
Authentication and authorisation failures — the security middleware blocks a request that the business logic assumes is permitted.
Data serialisation problems — objects serialise correctly in isolation but round-trip incorrectly through the network layer.

This is why a test suite with only unit tests is insufficient. Unit tests are fast and precise; integration tests are slower but address a different category of failure.

Acceptance testing verifies that the system meets the stakeholders' stated needs. Its defining characteristic is that it is specified from the stakeholder's perspective, not the engineer's.

Acceptance tests should be defined by — or in collaboration with — the stakeholders who will accept or reject the system at release. In Agile contexts, this is often done through Behaviour-Driven Development (BDD): acceptance criteria are written in a structured natural language format (Given-When-Then) that is both human-readable and automatable.

The critical link: acceptance tests verify the same acceptance criteria written in the requirements baseline (Unit 3). If an acceptance criterion has no corresponding automated acceptance test, it will not be verified before release. This is precisely what the RTM is designed to make visible.

Key Concepts: Automation & Quality

Test automation is a practice, not a destination. Quality assurance requires both the right tests and the discipline to maintain them.

Test coverage tells you what code was executed during testing — not whether the tests made meaningful assertions about that code's behaviour.

The pathological example:

A test that calls a function and asserts nothing achieves 100% coverage of that function while testing nothing at all.

High coverage is necessary because code that is never executed by tests cannot be known to work. But high coverage without strong assertions is a false indicator of quality. Coverage should be evaluated alongside defect escape rate and mutation testing results (whether tests detect artificially introduced defects).

Fault tolerance is a system's ability to continue operating correctly when one or more of its components fail. A fault-tolerant system degrades gracefully: it may operate with reduced functionality, but it does not catastrophically fail.

Fault tolerance is verified by testing under failure conditions — not under ideal conditions. Techniques include:

Chaos engineering — deliberately introducing failures into production or pre-production systems to verify that redundancy and failover mechanisms work as designed. Pioneered by Netflix's Chaos Monkey tool.
Circuit breaker testing — verifying that a circuit breaker correctly prevents cascading failures when a downstream service is unavailable
Load testing and stress testing — verifying performance under realistic and extreme load conditions; identifying the failure point

A system whose fault tolerance has never been tested has unverified fault tolerance — which is not the same as reliable fault tolerance.

Tests should be derived from requirements. A test that has no corresponding requirement is testing behaviour that was never specified — it may be testing the right thing, or it may be testing gold-plated behaviour that no stakeholder requested. A requirement that has no corresponding test will not be verified before release.

The RTM makes this relationship visible. At the testing stage, the RTM should have no empty "Test Case ID" cells for any Must-have or Should-have requirement. If it does, the team has not finished writing tests — regardless of what the coverage metric says.

This also means that when requirements change (Unit 3), the test suite must be reviewed. Tests written against a requirement that no longer exists may need to be removed or updated. Tests written against a new or changed requirement need to be written.

Watch

Video coming soon

Check Your Understanding

Select the best answer for each question.

Service A calls Service B via an API. Both services have 100% unit test coverage and all unit tests pass. A defect is discovered where Service A sends a date in DD/MM/YYYY format but Service B expects YYYY-MM-DD. Which type of test would have caught this defect?

A more comprehensive unit test of Service A's date-formatting function An integration test verifying the actual communication between Service A and Service B A code review of Service B's input parsing code A static analysis tool scanning Service A's code for date format issues

A test suite achieves 95% code coverage. The team's defect escape rate (proportion of defects reaching production) is 40%. What does this combination of metrics most suggest?

The test suite is high quality; the defects are in the untested 5% The team should increase coverage to 100% to eliminate defect escapes Tests are executing code but not making meaningful assertions; coverage is misleading The defect escape rate is unrelated to the test suite and is caused by deployment errors

AI Dimension

AI tools can generate test cases, suggest edge cases for a given function, produce property-based test inputs, and identify code paths that appear to lack test coverage.

Assist: AI can draft an initial test suite for a function, suggest boundary values and edge cases, and generate test data that exercises unusual combinations of inputs — tasks that are tedious and prone to human oversight.
Risk: AI generates tests based on the code it can see — not the requirements the code was meant to satisfy. If the code is wrong, AI-generated tests may verify the wrong behaviour correctly. AI-generated tests also tend toward the happy path; they are unlikely to suggest chaos engineering scenarios, concurrency edge cases, or the specific failure modes identified in the system's risk analysis.
Principle: A test suite generated entirely by AI verifies that the code does what the code does — not that it does what the requirements say. Acceptance tests derived from requirements, written with stakeholder input, cannot be replaced by AI-generated tests. The RTM is the accountability mechanism: every requirement must have a human-authored acceptance criterion that is independently verified.

Activities

Individual task

Write five test cases for the hospital appointment booking system, covering at least two different testing levels. For each test case, specify:

Test Case ID — for linking to the RTM
Requirement ID — which requirement from Unit 3 does this verify?
Testing level — unit, integration, system, or acceptance
Preconditions — what must be true before the test runs?
Test steps — the specific actions taken
Expected outcome — what a passing result looks like

Include at least one test case for an exception or alternative flow — not only the happy path.

Pair task

Exchange test cases with a partner and assess coverage:

Do the test cases address Must-have requirements from the Unit 3 baseline? Are there any Must-have requirements with no test case?
Is there a test for at least one exception or failure scenario?
Are the expected outcomes specific enough to produce a clear pass/fail result? Vague expected outcomes are not testable.
Could any of these tests be automated? Which testing level would each sit in the CI/CD pipeline designed in Unit 7?

Group task — testing strategy and RTM completion

As a group, produce a testing strategy for the hospital appointment booking system and complete the RTM. The testing strategy should cover:

The proportion of tests at each level (unit, integration, system, acceptance) and the justification for that distribution
Which tests will be automated and at which pipeline stage they run
Which tests require manual execution and why
The minimum coverage threshold and the other quality metrics you will track
How you will test fault tolerance — at least one technique described in sufficient detail to be actionable

Then update the group RTM from Unit 3, filling in the Test Case ID column for all Must-have requirements. Flag any Must-have requirement that has no corresponding test case — these are release risks that need to be addressed before a production deployment.

Review

Unit — individual functions/classes in isolation; fast; pinpoints failures precisely
Integration — components working together with real dependencies; catches API contract violations and schema mismatches
System — the complete integrated system end-to-end; catches emergent defects; slow and costly
Acceptance — the system against stakeholder needs; derived from acceptance criteria in requirements; the final gate before release

Coverage measures code execution — whether lines of code were reached during testing. It does not measure assertion quality — whether those tests actually verified anything meaningful about the behaviour. High coverage with weak assertions produces a test suite that runs green while defects remain undetected.

Coverage should be tracked alongside defect escape rate, mutation test scores, and direct review of test assertion quality. The RTM provides the accountability mechanism: every requirement must have a test case with a specific expected outcome.

Proceed to Unit 9: Deployment & Security when ready.

Unit 8: Testing & Quality

Learning Objectives

Core Input

Key Concepts: Testing Levels

What makes a unit test well-designed?

What can integration tests catch that unit tests cannot?

What is acceptance testing and who should define it?

Key Concepts: Automation &amp; Quality

Why is test coverage a necessary but not sufficient quality measure?

What is fault tolerance and how is it tested?

What is the relationship between testing and the requirements baseline?

Watch

Check Your Understanding

Activities

Review

Summary: What does each testing level verify?

Summary: Why is test coverage necessary but not sufficient?

Key Concepts: Automation & Quality