IB DP · Thinka-original Practice Paper

2023 IB DP Computer Science Practice Paper with Answers

Thinka Nov 2023 HL IB Diploma Programme-Style Mock — Computer Science

195 marks270 mins2023
An original Thinka practice paper modelled on the structure and difficulty of the Nov 2023 HL IB Diploma Programme Computer Science paper. Not affiliated with or reproduced from IB.

Paper 1 Section A

Answer all questions.
9 Question · 24.93 marks
Question 1 · short_answer
2.77 marks
State one advantage and one disadvantage of using a direct changeover strategy compared to a parallel running strategy when implementing a new computer system in a school.
Show answer & marking scheme

Worked solution

A direct changeover strategy involves shutting down the old system and immediately starting the new system.

Advantage: It minimizes cost, time, and duplication of effort because staff only use one system and data does not need to be entered twice (unlike parallel running).

Disadvantage: It is highly risky. If the new system fails or contains critical bugs, the school may be left without any operational system, disrupting classes and administrative tasks.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Valid advantage clearly stated (e.g., lower cost, less time-consuming, no duplicate data entry).
- 1 mark: Valid disadvantage clearly stated (e.g., high risk, lack of fallback system, high pressure on staff).
- 0.77 marks: Explicit link to the school context or clarity of comparison.
Question 2 · short_answer
2.77 marks
Explain how the use of cache memory can increase the performance of the Central Processing Unit (CPU).
Show answer & marking scheme

Worked solution

Cache memory acts as a high-speed intermediary between the CPU registers and the primary RAM. Because static RAM (SRAM) used in cache is much faster than dynamic RAM (DRAM) used in main memory, and because it is physically closer to the CPU, accessing data from cache takes significantly fewer clock cycles. By pre-fetching and storing frequently or recently used instructions/data in the cache, the CPU minimizes 'wait states' and processes instructions much faster.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Stating that cache memory is much faster than main memory (RAM) and located close to/on the CPU.
- 1 mark: Explaining that cache stores frequently, recently, or anticipated instructions/data.
- 0.77 marks: Detailing how this reduces the CPU idle time or latency by avoiding slower RAM lookups.
Question 3 · short_answer
2.77 marks
Describe how packets are routed from a sender to a receiver in a packet-switched network.
Show answer & marking scheme

Worked solution

In a packet-switched network, the communication process follows these steps:
1. The original message is divided into smaller packets.
2. Each packet is given a header containing control information, including the sender's IP, receiver's IP, and a sequence number.
3. Packets are sent into the network, where intermediate nodes (routers) inspect the destination address and forward them along the optimal path available at that moment. Packets may take completely different physical routes.
4. Upon reaching the destination, the receiver uses the sequence numbers to reorder and reassemble the packets into the original message, requesting retransmission for any missing packets.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Stating that data is split into packets containing headers (with source, destination, and sequence information).
- 1 mark: Describing how routers dynamically route packets independently along different paths depending on network traffic.
- 0.77 marks: Describing the reconstruction process at the destination using sequence numbers.
Question 4 · short_answer
2.77 marks
Trace the following algorithm and state the final value returned by \( \text{findResult}(5) \):

```
function findResult(N)
VAL = 1
loop I from 1 to N
VAL = VAL * I
if VAL > 10 then
VAL = VAL - 5
end if
end loop
return VAL
end function
```
Show answer & marking scheme

Worked solution

Let's trace the algorithm step-by-step with \( N = 5 \):
- Initially, \( \text{VAL} = 1 \).
- **Iteration 1 (\( I = 1 \))**:
\( \text{VAL} = 1 \times 1 = 1 \).
Is \( 1 > 10 \)? No. So \( \text{VAL} \) remains 1.
- **Iteration 2 (\( I = 2 \))**:
\( \text{VAL} = 1 \times 2 = 2 \).
Is \( 2 > 10 \)? No. So \( \text{VAL} \) remains 2.
- **Iteration 3 (\( I = 3 \))**:
\( \text{VAL} = 2 \times 3 = 6 \).
Is \( 6 > 10 \)? No. So \( \text{VAL} \) remains 6.
- **Iteration 4 (\( I = 4 \))**:
\( \text{VAL} = 6 \times 4 = 24 \).
Is \( 24 > 10 \)? Yes.
\( \text{VAL} = 24 - 5 = 19 \).
- **Iteration 5 (\( I = 5 \))**:
\( \text{VAL} = 19 \times 5 = 95 \).
Is \( 95 > 10 \)? Yes.
\( \text{VAL} = 95 - 5 = 90 \).

The loop finishes and the function returns 90.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Evidence of tracing correctly through \( I = 3 \) (i.e. showing VAL values of 1, 2, 6).
- 1 mark: Correctly computing the branch subtraction for \( I = 4 \) (showing VAL becomes 19).
- 0.77 marks: Correctly calculating the final returned value of 90.
Question 5 · short_answer
2.77 marks
A circular queue is implemented using a static array of size 5 (indices 0 to 4). State the index positions of the `head` and `tail` pointers after the following sequence of operations, starting from an empty queue where both `head` and `tail` are initially -1, and enqueueing increases `tail` modulo 5:

1. Enqueue(A)
2. Enqueue(B)
3. Enqueue(C)
4. Dequeue()
5. Enqueue(D)
6. Dequeue()
Show answer & marking scheme

Worked solution

Let's trace the state of the queue:
- **Initial State**: `head = -1`, `tail = -1`. Array: `[_, _, _, _, _]`
- **1. Enqueue(A)**: First element added. Both `head` and `tail` become 0. Array: `[A, _, _, _, _]`
- **2. Enqueue(B)**: `tail` increments to 1. `head` remains 0. Array: `[A, B, _, _, _]`
- **3. Enqueue(C)**: `tail` increments to 2. `head` remains 0. Array: `[A, B, C, _, _]`
- **4. Dequeue()**: Removes 'A' from index 0. `head` increments to 1. `tail` remains 2. Array: `[_, B, C, _, _]`
- **5. Enqueue(D)**: `tail` increments to 3. `head` remains 1. Array: `[_, B, C, D, _]`
- **6. Dequeue()**: Removes 'B' from index 1. `head` increments to 2. `tail` remains 3. Array: `[_, _, C, D, _]`

Final pointer values: `head = 2`, `tail = 3`.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Correctly tracing the pointers up to step 3 (`head = 0`, `tail = 2`).
- 1 mark: Correctly adjusting pointers after the first Dequeue and Enqueue(D) (`head = 1`, `tail = 3`).
- 0.77 marks: Providing both correct final positions (`head = 2` and `tail = 3`).
Question 6 · short_answer
2.77 marks
Outline how the operating system manages virtual memory when physical RAM is fully occupied.
Show answer & marking scheme

Worked solution

Virtual memory is a management technique where secondary storage acts as an extension of physical RAM. When RAM is full:
1. The OS identifies inactive chunks of memory, called pages.
2. It moves (swaps) these pages out of RAM and writes them to a dedicated area on the hard disk or SSD (swap file/page file).
3. The OS updates its page table to track where these virtual pages are currently located.
4. When the CPU requests data located on a swapped-out page, a page fault occurs, and the OS suspends the process to swap the requested page back into RAM, potentially swapping another page out to make space.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Mentioning the division of memory into pages and the use of secondary storage (swap space) as an extension of RAM.
- 1 mark: Explaining the process of swapping/paging out inactive data from RAM to disk when RAM is full.
- 0.77 marks: Outlining the retrieval process (handling page faults/address translation using page tables).
Question 7 · short_answer
2.77 marks
An automated greenhouse uses temperature sensors and ventilation actuators. Describe the role of the processor within this control loop.
Show answer & marking scheme

Worked solution

In an automated feedback control loop:
1. The processor continuously receives readings from the temperature sensors (often converted from analog to digital via an ADC).
2. The processor compares these incoming values against predefined setpoints (e.g., target temperature range of \( 18^{\circ}\text{C} \) to \( 24^{\circ}\text{C} \)).
3. If the temperature is too high, the processor executes preprogrammed logic to send a control signal (often converted via a DAC) to open the ventilation actuators.
4. If the temperature is within bounds, no action or a counter-action is taken. This process runs continuously to maintain equilibrium.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Describing the input phase (receiving data from sensors, comparing with threshold/target values).
- 1 mark: Describing the decision-making phase (using logical parameters to determine if adjustments are required).
- 0.77 marks: Describing the output phase (sending signals to actuators to adjust ventilation/restore the target temperature).
Question 8 · short_answer
2.77 marks
Outline the concept of encapsulation in Object-Oriented Programming (OOP) and explain how it is implemented.
Show answer & marking scheme

Worked solution

Encapsulation is a core concept of OOP that combines an object's state (variables) and behaviors (methods) into a single structural unit called a class. It prevents unauthorized outer code from directly modifying an object's internal state (data hiding).

Implementation is achieved by:
1. Marking class variables with the `private` or `protected` access modifiers.
2. Creating public methods (getters/accessors and setters/mutators) to provide controlled, validated access to these variables.

Marking scheme

Award up to 2.77 marks as follows:
- 1 mark: Defining encapsulation as bundling data and methods into a single class unit.
- 1 mark: Explaining the protection aspect (data hiding/restricting external direct access using private modifiers).
- 0.77 marks: Outlining the implementation using public getter and setter methods to control state access safely.
Question 9 · Short Answer
2.77 marks
An array of integers is defined as: A = [4, 9, 3, 11, 6]. Consider the following algorithm:

VAL = 0
loop I from 0 to 3
if A[I] < A[I+1] then
VAL = VAL + (A[I+1] - A[I])
end if
end loop

Construct a trace table to show the values of I, A[I], A[I+1], and VAL at each step of the algorithm, and state the final value of VAL.
Show answer & marking scheme

Worked solution

Trace table details:
- Initial state: VAL = 0
- Iteration I = 0: A[0] (4) < A[1] (9) is True. VAL = 0 + (9 - 4) = 5
- Iteration I = 1: A[1] (9) < A[2] (3) is False. VAL remains 5
- Iteration I = 2: A[2] (3) < A[3] (11) is True. VAL = 5 + (11 - 3) = 13
- Iteration I = 3: A[3] (11) < A[4] (6) is False. VAL remains 13

Final value of VAL is 13.

Marking scheme

Award marks as follows:
- 1 mark: Correctly identifying the condition evaluations for all iterations (True for I=0 and I=2, False for I=1 and I=3).
- 1 mark: Correctly calculating intermediate values of VAL (VAL = 5 at I = 0).
- 0.77 marks: Providing the correct final value of VAL (13) with a corresponding trace table/explanation.

Paper 1 Section B

Answer all questions.
5 Question · 75 marks
Question 1 · structured
15 marks
A large hospital is upgrading its patient record management system from an older legacy system to a modern, cloud-based electronic health record (EHR) system.

(a) Identify two reasons why the hospital management might decide to use a parallel running implementation method instead of a direct changeover. [4 marks]

(b) Describe one distinct disadvantage of using a parallel running implementation in this hospital context. [2 marks]

(c) Outline the importance of user documentation and explain why online interactive tutorials could be more effective than a printed user manual for training the hospital staff. [4 marks]

(d) Explain how beta testing, involving a selected group of medical staff, can help identify usability issues before the final nationwide release of the EHR system. [5 marks]
Show answer & marking scheme

Worked solution

(a) Two reasons to use parallel running:
1. Risk reduction: Since hospital data is critical, if the new cloud system has bugs or fails, the legacy system is still fully active, ensuring patient safety and uninterrupted service.
2. Validation: It allows outputs (e.g., patient prescriptions or test results) from both systems to be cross-checked side-by-side to guarantee that the new system is accurate and functioning correctly.

(b) Disadvantage of parallel running in a hospital:
- High duplication of labor: Hospital staff (doctors, nurses) must enter patient records into both systems. This increases workload in an already high-stress environment, potentially leading to data entry errors or fatigue.

(c) Importance of documentation and benefits of online tutorials:
- User documentation is crucial because it reduces downtime, minimizes support costs, and helps users understand how to navigate the new software correctly.
- Online interactive tutorials are superior because they provide a hands-on learning experience where staff can simulate tasks, which improves retention. They can also be updated immediately if a software patch is applied, unlike a printed manual.

(d) Use of beta testing:
- Beta testing involves actual users (e.g., doctors and nurses) testing the product in their real work environment under normal stress levels.
- It uncovers unexpected interface issues, awkward navigation flows, or screen-layout challenges that developers did not anticipate.
- Feedback can be gathered directly regarding user satisfaction, accessibility issues, and compatibility with daily clinical tasks, allowing the development team to make crucial improvements before the final release.

Marking scheme

(a) Award up to [4 marks] as follows:
- Award [1 mark] for each reason identified (up to 2 reasons).
- Award [1 mark] for each development/explanation in the context of a hospital.
- Example: Low risk [1 mark]; because if the new system fails, the old system is still active to protect patient lives [1 mark].
- Example: Comparison of outputs [1 mark]; so medical staff can verify that data is transferred correctly without errors [1 mark].

(b) Award up to [2 marks] as follows:
- Award [1 mark] for identifying the disadvantage (extra workload/cost/time).
- Award [1 mark] for linking it to the hospital scenario (e.g., busy staff entering double data).

(c) Award up to [4 marks] as follows:
- Award [1 mark] for explaining why documentation is important (reduces errors/support costs).
- Award [1 mark] for identifying a benefit of interactive tutorials (hands-on/self-paced).
- Award [1 mark] for identifying a benefit of updates (can be modified in real-time).
- Award [1 mark] for a comparison with printed manuals (printed manuals quickly become obsolete).

(d) Award up to [5 marks] as follows:
- Award [1 mark] for defining beta testing (released to end-users in a real-world setting).
- Award [1 mark] for explaining that real users (doctors/nurses) will test the system under normal working conditions.
- Award [1 mark] for mentioning that usability issues (e.g., unintuitive navigation, font size, button placement) are identified.
- Award [1 mark] for noting that feedback is collected before the final nationwide launch.
- Award [1 mark] for stating that this leads to system refinements and higher final user acceptance.
Question 2 · structured
15 marks
A weather monitoring station collects hourly temperature readings over a 24-hour period. These readings are stored in a one-dimensional array named `TEMPS` of size 24 (with indices 0 to 23).

(a) Construct a pseudocode algorithm that traverses the `TEMPS` array and calculates:
(i) The maximum absolute difference between any two consecutive hours (e.g., the absolute difference between `TEMPS[i]` and `TEMPS[i+1]`). [5 marks]
(ii) The total number of hours during which the temperature fell below 0 degrees Celsius. [3 marks]

(b) Explain the difference between 'selection' and 'iteration' as control structures, identifying one specific example of each from your algorithm in Part (a). [4 marks]

(c) State the Big O efficiency of this traversal algorithm and justify your answer based on the array size and loops. [3 marks]
Show answer & marking scheme

Worked solution

(a) Sample pseudocode:
```
maxDiff = 0
belowFreezing = 0
for i = 0 to 23
if TEMPS[i] < 0 then
belowFreezing = belowFreezing + 1
end if
end for
for i = 0 to 22
diff = TEMPS[i] - TEMPS[i+1]
if diff < 0 then
diff = diff * -1
end if
if diff > maxDiff then
maxDiff = diff
end if
end for
output "Hours below freezing: ", belowFreezing
output "Max consecutive difference: ", maxDiff
```
*(Note: A single loop combining both functions is also highly acceptable and more efficient.)*

(b) Differences:
- Selection is a decision-making control structure that branches execution path based on a Boolean condition (e.g., `if...then`). In the algorithm, `if diff > maxDiff then` is an example of selection, which only executes the assignment statement when the condition is true.
- Iteration is a loop control structure that repeats a block of instructions multiple times (e.g., `for...to`). In the algorithm, `for i = 0 to 22` is an iteration that repeats the difference calculation 23 times.

(c) Time Complexity:
- The algorithm has a time complexity of \(O(N)\), where \(N\) is the number of elements in the array (24 in this case).
- This is because the algorithm uses a simple, non-nested loop to traverse the array from start to end.
- Each element is visited a constant number of times, resulting in a linear scaling relationship.

Marking scheme

(a) Award up to [8 marks] in total:
(i) Max Difference algorithm [5 marks]:
- [1 mark] Correctly initializing a variable `maxDiff` to 0.
- [1 mark] Setting up a loop that successfully iterates through indices 0 to 22 (avoiding out-of-bounds errors).
- [1 mark] Calculating the difference and converting it to an absolute value (either using an absolute function or checking if negative and multiplying by -1).
- [1 mark] Correctly checking if the current difference is greater than `maxDiff` and updating `maxDiff`.
- [1 mark] Outputting the final `maxDiff` correctly.

(ii) Below zero counter [3 marks]:
- [1 mark] Correctly initializing a counter variable `belowFreezing` to 0.
- [1 mark] Correctly setting up a conditional check (`TEMPS[i] < 0`) inside a loop.
- [1 mark] Correctly incrementing the counter and outputting the result.

(b) Award up to [4 marks] as follows:
- [1 mark] Correct definition of selection.
- [1 mark] Correctly identifying a selection example from their pseudocode.
- [1 mark] Correct definition of iteration.
- [1 mark] Correctly identifying an iteration example from their pseudocode.

(c) Award up to [3 marks] as follows:
- [1 mark] Correctly stating the time complexity as \(O(N)\) (or \(O(1)\) if argued that the array size is strictly fixed at 24).
- [2 marks] Solid justification mentioning that there are linear non-nested loops that scale directly with the array size.
Question 3 · structured
15 marks
A local logistics center manages incoming packages using a circular queue implemented via a fixed-size array named `PACKAGES` of size 5 (indices 0 to 4). Two integer variables are used: `head` (index of the front element) and `tail` (index where the next element will be inserted). A variable `count` stores the current number of elements in the queue.

Initially, the queue is empty: `head = 0`, `tail = 0`, and `count = 0`.

(a) State the benefit of using a circular queue over a standard linear queue implemented within a fixed-size array. [4 marks]

(b) Show the state of the `PACKAGES` array and the final values of `head`, `tail`, and `count` after executing the following sequence of operations:
1. enqueue("Pkg_A")
2. enqueue("Pkg_B")
3. dequeue()
4. enqueue("Pkg_C")
5. enqueue("Pkg_D")
6. dequeue()
7. enqueue("Pkg_E")
8. enqueue("Pkg_F")
[5 marks]

(c) Construct a pseudocode algorithm for the `enqueue(newPackage)` method. The method should output an error message if the queue is full, and otherwise insert the package and update all necessary pointers and counters. [6 marks]
Show answer & marking scheme

Worked solution

(a) Benefits of a circular queue:
- In a fixed-size linear queue, as elements are dequeued, the space at the front of the array becomes unusable because the pointers only move forward. Eventually, no more elements can be enqueued even if there is free space at the beginning of the array.
- A circular queue overcomes this limitation by "wrapping around" pointers (using the modulus operator) to the beginning of the array. This allows the reuse of freed memory slots, achieving \(O(1)\) time complexity for enqueue and dequeue without requiring expensive shifting operations.

(b) Step-by-step trace of operations:
- Start: `PACKAGES = [null, null, null, null, null]`, `head = 0`, `tail = 0`, `count = 0`
- 1. `enqueue("Pkg_A")`: `PACKAGES[0] = "Pkg_A"`, `tail = 1`, `count = 1`
- 2. `enqueue("Pkg_B")`: `PACKAGES[1] = "Pkg_B"`, `tail = 2`, `count = 2`
- 3. `dequeue()`: removes "Pkg_A" (from index 0), `head = 1`, `count = 1`
- 4. `enqueue("Pkg_C")`: `PACKAGES[2] = "Pkg_C"`, `tail = 3`, `count = 2`
- 5. `enqueue("Pkg_D")`: `PACKAGES[3] = "Pkg_D"`, `tail = 4`, `count = 3`
- 6. `dequeue()`: removes "Pkg_B" (from index 1), `head = 2`, `count = 2`
- 7. `enqueue("Pkg_E")`: `PACKAGES[4] = "Pkg_E"`, `tail = 0` (wraps around), `count = 3`
- 8. `enqueue("Pkg_F")`: `PACKAGES[0] = "Pkg_F"`, `tail = 1` (wraps around), `count = 4`

Final State:
- `PACKAGES[0] = "Pkg_F"`
- `PACKAGES[1] = null` (or logically empty)
- `PACKAGES[2] = "Pkg_C"`
- `PACKAGES[3] = "Pkg_D"`
- `PACKAGES[4] = "Pkg_E"`
- `head = 2`
- `tail = 1`
- `count = 4`

(c) Pseudocode for `enqueue(newPackage)`:
```
method enqueue(newPackage)
if count == 5 then
output "Error: Queue Overflow"
else
PACKAGES[tail] = newPackage
tail = (tail + 1) mod 5
count = count + 1
end if
end method
```

Marking scheme

(a) Award up to [4 marks] as follows:
- [1 mark] Identifying the limitation of a standard linear queue (starvation of array slots / unusable space at front).
- [1 mark] Stating that a circular queue resolves this by wrapping indices around to index 0.
- [1 mark] Explaining that this maximizes memory reuse within a fixed-size space.
- [1 mark] Explaining that it avoids the need to shift remaining elements to the front, which maintains \(O(1)\) time complexity.

(b) Award up to [5 marks] as follows:
- [2 marks] for showing the correct contents of the final `PACKAGES` array (award 1 mark if only 1-2 elements are incorrect).
- [1 mark] for correct final value of `head` (2).
- [1 mark] for correct final value of `tail` (1).
- [1 mark] for correct final value of `count` (4).

(c) Award up to [6 marks] as follows:
- [1 mark] Correct method signature and structure.
- [2 marks] Correct check for queue overflow using the variable `count` (or by comparing `head` and `tail` relative to size).
- [1 mark] Correct insertion of `newPackage` at index `tail`.
- [1 mark] Correct update of pointer `tail` using modulo arithmetic (e.g., `(tail + 1) mod 5` or equivalent `if` statement).
- [1 mark] Correct increment of the `count` variable.
Question 4 · structured
15 marks
A user is streaming an online lecture video on a smartphone connected to a home wireless local area network (WLAN).

(a) Describe how packet switching is used to transmit the video data from the streaming server to the smartphone. [6 marks]

(b) Distinguish between the functions of the Transmission Control Protocol (TCP) and the Internet Protocol (IP) in this scenario. [5 marks]

(c) Explain why lossy compression, rather than lossless compression, is used to stream video, and why lossy compression would be inappropriate for transferring a software installation file. [4 marks]
Show answer & marking scheme

Worked solution

(a) Packet switching process:
1. The video file on the server is split up into smaller units of data called packets.
2. Each packet is appended with a header containing routing information, including the sender's IP address, the destination IP address, and packet sequence number.
3. The packets are transmitted onto the network independently.
4. Routers across the internet dynamically direct packets along different paths depending on network congestion and availability.
5. Because they take different routes, packets may arrive at the smartphone out of order or with varying delays.
6. The smartphone's network software uses the sequence numbers in the packet headers to reassemble them back into the original continuous video stream.

(b) Distinguishing TCP and IP:
- TCP (Transmission Control Protocol) operates at the Transport Layer and is connection-oriented. It manages the breaking of the message into packets, ensures reliable delivery via handshaking, verifies arrival using acknowledgments, detects errors, and requests retransmission of lost packets. It also puts packets back in order.
- IP (Internet Protocol) operates at the Network Layer and is connectionless. Its primary function is addressing and routing individual packets from the sender to the receiver using IP addresses. It does not guarantee delivery or correct ordering; it simply handles the path delivery.

(c) Lossy compression vs. Lossless compression:
- For video streaming: Video contains vast amounts of data. Lossy compression removes redundant or imperceptible details (such as minute color changes) to drastically reduce the file size. This allows smooth real-time playback over limited wireless bandwidth without noticeable loss of quality to the human eye.
- For software installer: Software consists of precise machine code instructions. If lossy compression is used, arbitrary bits of data will be permanently discarded. This corrupts the executable file, making it impossible to install or run, as every single bit of a software file must be exact.

Marking scheme

(a) Award up to [6 marks] as follows:
- [1 mark] Data is segmented into smaller units (packets).
- [1 mark] Headers containing destination address and sequence numbers are added.
- [1 mark] Packets are sent independently through the network.
- [1 mark] Routers dynamically determine the best path for each packet.
- [1 mark] Packets can arrive out of order.
- [1 mark] Reassembly of packets occurs at the destination based on sequence numbers.

(b) Award up to [5 marks] as follows:
- [1 mark] Stating TCP is connection-oriented and IP is connectionless.
- [1 mark] Explaining TCP's role in reliable data delivery (checking for missing/corrupted packets).
- [1 mark] Explaining TCP's role in reordering packets at the destination.
- [1 mark] Explaining IP's role in routing individual packets using source and destination IP addresses.
- [1 mark] Contrast: TCP manages session integrity, while IP handles physical packet delivery.

(c) Award up to [4 marks] as follows:
- [1 mark] Explaining that video files are large and lossy compression dramatically reduces bandwidth requirement.
- [1 mark] Noting that the human eye cannot easily perceive the discarded visual information.
- [1 mark] Stating that software executables require absolute accuracy / cannot tolerate missing data.
- [1 mark] Explaining that discarding even a single bit corrupts program execution, making the installer useless.
Question 5 · structured
15 marks
A graphics design workstation contains a CPU with a control unit (CU), arithmetic logic unit (ALU), three levels of cache memory (L1, L2, L3), and primary storage (RAM).

(a) Outline the role of the following CPU registers during the Fetch-Execute cycle:
(i) Program Counter (PC) [2 marks]
(ii) Memory Address Register (MAR) [2 marks]

(b) Explain the process of a memory write operation, referencing the data bus, control bus, address bus, and RAM. [5 marks]

(c) Distinguish between RAM and Cache memory in terms of physical location, access speed, and cost per megabyte. Discuss how the use of a three-level cache architecture improves CPU performance. [6 marks]
Show answer & marking scheme

Worked solution

(a) Roles of registers:
(i) Program Counter (PC): Holds the memory address of the next instruction to be fetched from RAM. It automatically increments at each cycle to point to the subsequent instruction.
(ii) Memory Address Register (MAR): Holds the physical memory address (either for an instruction or data) currently being read from or written to by the CPU.

(b) Memory Write Operation:
1. The CPU places the target memory address (where data needs to be stored) onto the Address Bus.
2. The CPU places the data to be written onto the Data Bus.
3. The Control Unit (CU) sends a 'write' signal along the Control Bus to RAM.
4. RAM receives the signal, decodes the address on the Address Bus, and transfers the data from the Data Bus into that specific memory cell.

(c) Comparison and Cache architecture:
- RAM vs Cache:
- Cache is located directly on or immediately adjacent to the CPU chip, whereas RAM is located further away on the motherboard.
- Cache is vastly faster than RAM (nanosecond response times vs tens of nanoseconds).
- Cache (using SRAM) is significantly more expensive per megabyte than RAM (using DRAM), limiting its capacity.
- Three-level Cache Architecture:
- L1 is the smallest and fastest, checked first by the CPU. L2 is slightly larger and slower, checked next. L3 is the largest and slowest of the caches, checked before querying RAM.
- By holding frequently used data and instructions in L1/L2/L3, the CPU maximizes the "cache hit" rate. This prevents the CPU from having to access the much slower RAM bus, minimizing "wait states" and drastically increasing overall execution speed.

Marking scheme

(a) Award up to [4 marks] as follows:
(i) Program Counter (PC) [2 marks]:
- [1 mark] Stating that it holds the address of the next instruction.
- [1 mark] Explaining that it increments to point to the next instruction in sequence.
(ii) Memory Address Register (MAR) [2 marks]:
- [1 mark] Stating that it holds the current address of data/instruction being accessed.
- [1 mark] Explaining that it acts as the interface connection to the Address Bus.

(b) Award up to [5 marks] as follows:
- [1 mark] Address is placed on the Address Bus.
- [1 mark] Data is placed on the Data Bus.
- [1 mark] Control signal (write) is sent via the Control Bus.
- [1 mark] RAM decodes the target address.
- [1 mark] Data is moved from the Data Bus into the selected RAM memory cells.

(c) Award up to [6 marks] as follows:
- [1 mark] Distinguishing physical location (Cache on-chip vs RAM on motherboard).
- [1 mark] Distinguishing access speed (Cache is much faster than RAM).
- [1 mark] Distinguishing cost (Cache/SRAM is much more expensive per MB than RAM/DRAM).
- [1 mark] Explaining L1, L2, and L3 hierarchy (L1 is fastest/smallest, L3 is largest/slowest of the cache levels).
- [1 mark] Explaining the concept of "cache hits" and keeping active data close to the CPU.
- [1 mark] Linking this to increased performance by reducing CPU idle time while waiting for data from slow primary memory.

Paper 2 Option D: OOP

Answer all questions of this option.
5 Question · 65 marks
Question 1 · Structured Programming & OOP Questions
13 marks
An educational institution uses an object-oriented program to manage its digital library. The base class is MediaItem and has a subclass Book.

The class MediaItem is defined as follows:

public class MediaItem {
private String title;
private String id;
private boolean isBorrowed;

public MediaItem(String title, String id) {
this.title = title;
this.id = id;
this.isBorrowed = false;
}

public String getTitle() { return title; }
public String getId() { return id; }
public boolean getIsBorrowed() { return isBorrowed; }
public void setIsBorrowed(boolean status) { this.isBorrowed = status; }
}

(a) Outline one advantage of using inheritance to create the Book subclass from the MediaItem class. [2]

(b) Construct the class definition for Book. It must inherit from MediaItem and include:
- private instance variables: author (String) and pageCount (int).
- a constructor that initializes all fields (including those inherited).
- accessor methods for the new instance variables. [4]

(c) A class Library contains an array of MediaItem objects named catalog and an integer itemCount representing the actual number of items in the catalog.
Write a method borrowItem(String targetTitle) for the Library class that searches the catalog for a MediaItem with a matching title. If the item is found and is not currently borrowed, the method should set its borrowed status to true and return true. Otherwise, it should return false. [4]

(d) Explain the concept of encapsulation and how it is implemented in the MediaItem class to ensure data integrity. [3]
Show answer & marking scheme

Worked solution

(a) One advantage of using inheritance is code reusability. Instead of redefining common fields like title, id, and isBorrowed inside the Book class, Book automatically inherits these properties and their associated methods from MediaItem, reducing redundancy and minimizing potential errors in class definitions.

(b) Java Class Definition for Book:
public class Book extends MediaItem {
private String author;
private int pageCount;

public Book(String title, String id, String author, int pageCount) {
super(title, id);
this.author = author;
this.pageCount = pageCount;
}

public String getAuthor() { return author; }
public int getPageCount() { return pageCount; }
}

(c) Java Method implementation for Library:
public boolean borrowItem(String targetTitle) {
for (int i = 0; i < itemCount; i++) {
if (catalog[i] != null && catalog[i].getTitle().equals(targetTitle)) {
if (!catalog[i].getIsBorrowed()) {
catalog[i].setIsBorrowed(true);
return true;
}
}
}
return false;
}

(d) Encapsulation is the practice of bundling data (variables) and the methods that operate on that data into a single unit (class) while restricting direct access to the state. In MediaItem, encapsulation is implemented by declaring the instance variables (title, id, isBorrowed) as private, and exposing public accessor (getters) and mutator (setters) methods. This protects the data from unauthorized or invalid modification by external classes.

Marking scheme

(a) [2 marks]
- 1 mark: Identify a valid advantage of inheritance (e.g., code reuse, single point of maintenance, extensibility).
- 1 mark: Apply the advantage to the context (e.g., Book subclass reuse title and id without rewriting them).

(b) [4 marks]
- 1 mark: Correct subclass declaration with class Book extends MediaItem.
- 1 mark: Correct declaration of private attributes author and pageCount.
- 1 mark: Correct constructor using super(title, id) and initializing subclass variables.
- 1 mark: Correct accessor methods (getAuthor and getPageCount).

(c) [4 marks]
- 1 mark: Correct loop through catalog up to itemCount.
- 1 mark: Correct comparison of titles using string comparison methods (e.g., .equals()).
- 1 mark: Correct condition checking if the item is not borrowed and setting isBorrowed to true.
- 1 mark: Correct return statements (returning true upon success and false if not found or already borrowed).

(d) [3 marks]
- 1 mark: Define encapsulation (bundling data and methods and hiding internal state).
- 1 mark: Detail how it is implemented in MediaItem (private variables, public getters/setters).
- 1 mark: Explain the benefit of data integrity (prevents direct external tampering of variable states, ensuring controlled modifications).
Question 2 · Structured Programming & OOP Questions
13 marks
An Event Management company uses an object-oriented system to track events. The abstract class Event defines the common properties of all events.

public abstract class Event {
private String eventName;
private double basePrice;

public Event(String eventName, double basePrice) {
this.eventName = eventName;
this.basePrice = basePrice;
}

public String getEventName() { return eventName; }
public double getBasePrice() { return basePrice; }

public abstract double calculatePrice();
}

(a) Distinguish between an abstract class and a concrete class. [3]

(b) Construct the subclass Concert which inherits from Event. The class Concert must include:
- a private instance variable isVIP (boolean).
- a constructor that initializes all fields (including inherited ones).
- an implementation of the calculatePrice() method. If the concert is VIP, the price is the basePrice plus a 50.0 surcharge; otherwise, it is just the basePrice. [5]

(c) A controller class EventPlanner has an array of Event objects called scheduledEvents of size 100.
Write a method calculateTotalRevenue(int activeEvents) in the EventPlanner class that returns the total combined price of all events currently scheduled (where activeEvents represents the actual number of valid event objects stored in the array). [3]

(d) Outline why polymorphism is demonstrated when calculateTotalRevenue() is executed. [2]
Show answer & marking scheme

Worked solution

(a) An abstract class is a conceptual template that cannot be instantiated directly and may contain abstract methods (methods without an implementation). A concrete class is a fully implemented class that can be instantiated directly into objects and must provide concrete implementations for any inherited abstract methods.

(b) Java Class Definition for Concert:
public class Concert extends Event {
private boolean isVIP;

public Concert(String eventName, double basePrice, boolean isVIP) {
super(eventName, basePrice);
this.isVIP = isVIP;
}

@Override
public double calculatePrice() {
if (this.isVIP) {
return getBasePrice() + 50.0;
} else {
return getBasePrice();
}
}
}

(c) Java Method implementation for EventPlanner:
public double calculateTotalRevenue(int activeEvents) {
double total = 0.0;
for (int i = 0; i < activeEvents; i++) {
if (scheduledEvents[i] != null) {
total += scheduledEvents[i].calculatePrice();
}
}
return total;
}

(d) Polymorphism is demonstrated because the scheduledEvents array stores general Event references, but at runtime, the JVM dynamically binds the calculatePrice() call to the specific overridden implementation of the actual object type (e.g., Concert, or other subclasses of Event). This dynamic method lookup allows the total revenue to be calculated correctly without knowing the exact subclass type of each element beforehand.

Marking scheme

(a) [3 marks]
- 1 mark: Abstract class cannot be instantiated directly, whereas a concrete class can.
- 1 mark: Abstract class can contain abstract methods (signatures without body), while concrete classes must have implementations for all methods.
- 1 mark: Abstract classes serve as high-level blueprints, forcing a design contract on subclasses.

(b) [5 marks]
- 1 mark: Correct inheritance syntax (class Concert extends Event).
- 1 mark: Declaring private boolean isVIP.
- 1 mark: Constructor accepting parameters and using super(eventName, basePrice).
- 1 mark: Correct method signature for public double calculatePrice().
- 1 mark: Correct conditional expression checking isVIP and returning the basePrice with or without 50.0.

(c) [3 marks]
- 1 mark: Correct loop structure running up to activeEvents.
- 1 mark: Call to calculatePrice() on scheduledEvents[i] and accumulating the values.
- 1 mark: Correct return statement returning the accumulated double sum.

(d) [2 marks]
- 1 mark: Explaining that the exact method executed is determined at runtime based on the actual object type (dynamic/late binding).
- 1 mark: Highlighting that calling the abstract calculatePrice() dynamically executes the subclass specific behavior without manual type casting.
Question 3 · Structured Programming & OOP Questions
13 marks
A Smart Home controller manages various IoT devices. The class Device represents a generic smart device.

public class Device {
private String deviceId;
private boolean active;

public Device(String deviceId) {
this.deviceId = deviceId;
this.active = false;
}

public String getDeviceId() { return deviceId; }
public boolean isActive() { return active; }
public void setActive(boolean active) { this.active = active; }
}

The system uses a coordinator class SmartHome that contains an array of Device objects:

public class SmartHome {
private Device[] devices;
private int deviceCount;

public SmartHome(int capacity) {
devices = new Device[capacity];
deviceCount = 0;
}
}

(a) Define the relationship between SmartHome and Device. Justify whether this relationship represents aggregation or composition. [3]

(b) Write a method deactivateAll() for the SmartHome class that iterates through all added devices and sets their active status to false. [3]

(c) Construct a method getDeviceById(String targetId) in the SmartHome class that searches the array for a device with the specified deviceId. If found, it returns the Device object; otherwise, it returns null. [4]

(d) Explain one advantage and one disadvantage of using a fixed-size array instead of a dynamic data structure (like an ArrayList) to store the Device objects within the SmartHome class. [3]
Show answer & marking scheme

Worked solution

(a) The relationship is aggregation (has-a relationship). This is because Device objects can exist independently of the SmartHome system (e.g., they can exist as entities before being registered, or if the SmartHome system is deleted, the physical/logical devices still exist). It is not composition because the lifecycle of Device is not bound to the lifecycle of SmartHome.

(b) Java Method implementation for deactivateAll():
public void deactivateAll() {
for (int i = 0; i < deviceCount; i++) {
if (devices[i] != null) {
devices[i].setActive(false);
}
}
}

(c) Java Method implementation for getDeviceById():
public Device getDeviceById(String targetId) {
for (int i = 0; i < deviceCount; i++) {
if (devices[i] != null && devices[i].getDeviceId().equals(targetId)) {
return devices[i];
}
}
return null;
}

(d) Advantage: Fixed-size arrays have lower overhead and predictable memory usage, which is critical in hardware-constrained IoT environments where maximum capacity can be matched directly to physical constraints.
Disadvantage: A fixed-size array is inflexible; if the user attempts to add more devices than the initial capacity, the system will crash (ArrayIndexOutOfBoundsException) unless complex manual resizing code is implemented, whereas an ArrayList handles resizing dynamically.

Marking scheme

(a) [3 marks]
- 1 mark: Identify relationship as aggregation.
- 1 mark: State that Device can exist independently of SmartHome.
- 1 mark: Note that the deletion of SmartHome does not destroy Device instances (unlinked lifecycles).

(b) [3 marks]
- 1 mark: Correct loop bounds (0 up to deviceCount or devices.length with null-check).
- 1 mark: Retrieve element devices[i].
- 1 mark: Correct call to .setActive(false).

(c) [4 marks]
- 1 mark: Loop through active elements up to deviceCount.
- 1 mark: Use .equals() for string comparison of deviceId (e.g., devices[i].getDeviceId().equals(targetId)).
- 1 mark: Return devices[i] inside the conditional block on match.
- 1 mark: Return null after loop execution.

(d) [3 marks]
- 1 mark: Identify/Explain a valid advantage (e.g., efficient memory usage, predictable array performance, simpler indexing syntax).
- 1 mark: Identify/Explain a valid disadvantage (e.g., fixed size limits scaling, requires manual tracking of empty elements or resizing).
- 1 mark: Clear comparison or contextual link to SmartHome constraints.
Question 4 · Structured Programming & OOP Questions
13 marks
A fitness tracking application records workout activities. The base class is Activity and represents generic workouts.

public class Activity {
private int duration; // in minutes
private String intensity; // \"Low\", \"Medium\", \"High\"

public Activity(int duration, String intensity) {
this.duration = duration;
this.intensity = intensity;
}

public int getDuration() { return duration; }
public String getIntensity() { return intensity; }

public int calculateCalories() {
int rate = 0;
if (intensity.equals(\"Low\")) {
rate = 5;
} else if (intensity.equals(\"Medium\")) {
rate = 8;
} else if (intensity.equals(\"High\")) {
rate = 12;
}
return duration * rate;
}
}

A specialized class Running inherits from Activity:

public class Running extends Activity {
private double distance; // in kilometers

// constructor and methods to be implemented
}

(a) Define the constructor for the Running class that takes duration (int), intensity (String), and distance (double) as parameters, initializing all instance variables appropriately. [3]

(b) Construct the overridden method calculateCalories() in the Running class. The method should call the superclass's calculateCalories() method to get the base calories burned and then add an extra bonus of 5 calories for every full kilometer run (i.e., use integer division of distance). [4]

(c) Distinguish between method overloading and method overriding in object-oriented programming. [3]

(d) Explain the risk of declaring instance variables with public accessibility, and state how this risk is mitigated. [3]
Show answer & marking scheme

Worked solution

(a) Constructor for Running:
public Running(int duration, String intensity, double distance) {
super(duration, intensity);
this.distance = distance;
}

(b) Java Method for calculateCalories():
@Override
public int calculateCalories() {
int baseCalories = super.calculateCalories();
int bonus = ((int) this.distance) * 5;
return baseCalories + bonus;
}

(c) Overloading occurs within a single class where multiple methods share the same name but have different parameter lists (signatures). Overriding occurs in a inheritance relationship where a subclass provides a specific implementation of a method that is already defined in its parent class (with the identical name, parameters, and return type).

(d) Risk: Declaring instance variables public allows external classes to directly access and modify the state of an object without control, which can violate invariants (e.g. setting duration or distance to negative values).
Mitigation: Declaring variables as private (encapsulation) and forcing external actors to use public accessor (getter) and mutator (setter) methods which can perform boundary checking and validation before making alterations.

Marking scheme

(a) [3 marks]
- 1 mark: Correct method header public Running(int duration, String intensity, double distance).
- 1 mark: Correct use of super(duration, intensity) to call superclass constructor.
- 1 mark: Correct assignment of this.distance = distance.

(b) [4 marks]
- 1 mark: Correct method declaration overriding calculateCalories().
- 1 mark: Proper invocation of super.calculateCalories() to capture the base value.
- 1 mark: Correct conversion of distance to integer to find full kilometers run and multiplying by 5.
- 1 mark: Returning the sum of base and bonus.

(c) [3 marks]
- 1 mark: Correct explanation of overloading (same class, same name, different signatures).
- 1 mark: Correct explanation of overriding (subclass, same name, same signatures, replacing parent behavior).
- 1 mark: Correct distinguishing factor (e.g., compile-time/static binding for overloading vs run-time/dynamic binding for overriding).

(d) [3 marks]
- 1 mark: Risk explanation: loss of control over state, direct manipulation, breaking validation.
- 1 mark: Mitigation step: change access modifier to private (encapsulation).
- 1 mark: Explain role of getters/setters in enforcing business logic constraints (validation checking).
Question 5 · Structured Programming & OOP Questions
13 marks
An online commerce system utilizes a ShoppingCart class to manage a collection of purchase Item objects.

The Item class is defined as follows:

public class Item {
private String name;
private double price;

public Item(String name, double price) {
this.name = name;
this.price = price;
}

public String getName() { return name; }
public double getPrice() { return price; }
}

The ShoppingCart class has an array of Item objects:

public class ShoppingCart {
private Item[] cartItems;
private int itemCount;

public ShoppingCart(int capacity) {
this.cartItems = new Item[capacity];
this.itemCount = 0;
}
}

(a) Construct a method addItem(Item newItem) for the ShoppingCart class that adds newItem to the array if there is room. It should return true if the item was successfully added, and false if the cart is full. [3]

(b) Construct a method removeItemByName(String targetName) for the ShoppingCart class. This method searches for an item with a matching name (ignoring case). If found, it removes the item, shifts all subsequent items to the left to close the gap, decrements itemCount, and returns true. If the item is not found, it returns false. [6]

(c) Explain the purpose and use of the keyword this in the constructors and methods of the classes above. [2]

(d) Outline one scenario where a static variable would be appropriate to use inside the Item class. [2]
Show answer & marking scheme

Worked solution

(a) Java Method implementation for addItem():
public boolean addItem(Item newItem) {
if (itemCount < cartItems.length) {
cartItems[itemCount] = newItem;
itemCount++;
return true;
}
return false;
}

(b) Java Method implementation for removeItemByName():
public boolean removeItemByName(String targetName) {
for (int i = 0; i < itemCount; i++) {
if (cartItems[i] != null && cartItems[i].getName().equalsIgnoreCase(targetName)) {
// Shift remaining elements left
for (int j = i; j < itemCount - 1; j++) {
cartItems[j] = cartItems[j + 1];
}
cartItems[itemCount - 1] = null; // Clear duplicate last reference
itemCount--;
return true;
}
}
return false;
}

(c) The keyword 'this' refers to the current instance of the class. It is used in constructors and methods to distinguish between instance variables and parameters that share the same name, ensuring that assignment (e.g., this.name = name) correctly assigns the parameter value to the object's instance variable.

(d) A static variable belongs to the class rather than individual instances. An appropriate scenario is to store a shared property, such as totalItemCount (to keep track of the total number of items created globally across all shopping carts) or baseTaxRate (a tax percentage applied to all items).

Marking scheme

(a) [3 marks]
- 1 mark: Check if itemCount is less than cartItems.length.
- 1 mark: Assign newItem to cartItems[itemCount] and increment itemCount.
- 1 mark: Return true on success, and false on failure.

(b) [6 marks]
- 1 mark: Loop through active items (0 to itemCount - 1).
- 1 mark: Null-safe string comparison using .equalsIgnoreCase() or equivalent on item names.
- 2 marks: Correct shift loop starting from matching index up to itemCount - 2 to shift items left (e.g. cartItems[j] = cartItems[j+1]).
- 1 mark: Clear final element cartItems[itemCount - 1] to null and decrement itemCount.
- 1 mark: Return true upon successful removal, and return false after outer loop completes if not found.

(c) [2 marks]
- 1 mark: State that 'this' references the current object instance.
- 1 mark: Describe its role in resolving variable name collisions/shadowing (parameter name vs field name).

(d) [2 marks]
- 1 mark: Identify that static variables are shared class-wide.
- 1 mark: Provide a correct context-related example (e.g., class-wide taxRate, item counter).

Paper 3 Case Study

Answer all questions.
4 Question · 30 marks
Question 1 · Case Study Analysis & Essay Questions
7.5 marks
Explain the 'cold-start problem' for both new users and new items in a collaborative filtering recommender system, and analyze how a hybrid recommendation approach can be used to mitigate these difficulties.
Show answer & marking scheme

Worked solution

The student must address both aspects of the cold-start problem and explain the hybrid approach.

1. User Cold-Start:
- Occurs when a new user joins the platform.
- There is no historical data (clicks, purchases, ratings) for this user.
- Therefore, similarity-based metrics (such as cosine similarity or Pearson correlation) cannot find 'neighbors' with similar tastes to make recommendations.

2. Item Cold-Start:
- Occurs when a new item is added to the catalog.
- There are no initial interactions or ratings from any users.
- The collaborative algorithm cannot calculate item-to-item similarities or place the item in the collaborative space, resulting in the item never being recommended (the 'long-tail' isolation).

3. Hybrid Recommendation Solution:
- A hybrid recommender system integrates collaborative filtering with other methods, such as content-based filtering or demographic analysis.
- For new items, the system utilizes metadata (e.g., actors, genre, price) to find similar items in the existing database and recommends the new item to users who like similar content.
- For new users, the system uses an active learning approach (e.g., onboarding questionnaires) or geographic/demographic profiles to seed a content-based recommendation loop until enough implicit/explicit feedback is gathered for collaborative computations.

Marking scheme

Max 7.5 marks:
- Up to 3 marks for a clear explanation of the cold-start problem (1.5 marks for user cold-start details, 1.5 marks for item cold-start details).
- Up to 3 marks for analyzing how hybrid systems resolve these issues (1.5 marks for content-based / metadata-driven mitigation of item cold-start, 1.5 marks for onboarding / demographic mitigation of user cold-start).
- 1.5 marks for clarity of technical vocabulary (such as similarity metrics, collaborative filtering, metadata, and active learning).
Question 2 · Case Study Analysis & Essay Questions
7.5 marks
Recommender systems rely heavily on explicit and implicit feedback. Distinguish between these two types of data collection, and discuss how the implementation of Differential Privacy can protect user data while maintaining recommendation utility.
Show answer & marking scheme

Worked solution

1. Distinction between Explicit and Implicit Feedback:
- Explicit feedback involves direct user input (e.g., rating a movie 5 stars). It represents unambiguous user preferences but suffers from data sparsity and self-selection bias.
- Implicit feedback involves passive monitoring of user behavior (e.g., dwell time, click logs, purchase history). It provides rich, continuous streams of data but introduces noise (e.g., a click does not necessarily mean the user liked the product).

2. Implementation of Differential Privacy (DP):
- Differential privacy injects mathematically calibrated noise (e.g., via Laplacian or Gaussian mechanisms) into the recommendation algorithms (such as Matrix Factorization gradient updates or user similarity matrices).
- DP ensures that any query or model output cannot be used to infer whether a specific user's record was part of the dataset, protecting users from membership inference attacks.

3. Privacy-Utility Trade-off:
- The parameter \(\epsilon\) controls the privacy-loss budget. A smaller \(\epsilon\) means more noise and greater privacy, but reduces the accuracy (utility) of recommendations because the subtle patterns in collaborative behavior get masked by noise.
- A larger \(\epsilon\) preserves utility but provides weaker privacy guarantees.

Marking scheme

Max 7.5 marks:
- Up to 2 marks for distinguishing explicit and implicit feedback (1 mark each with examples).
- Up to 3 marks for explaining how Differential Privacy works in the context of recommender databases (1.5 marks for mathematical noise/perturbation concept, 1.5 marks for preventing user reconstruction/membership inference).
- Up to 2.5 marks for evaluating the privacy-utility trade-off (specifically referencing the effect of noise and the selection of \(\epsilon\) on recommendation performance).
Question 3 · Case Study Analysis & Essay Questions
7.5 marks
Content-based filtering algorithms risk creating a 'filter bubble' or 'echo chamber' for users. Explain this phenomenon and evaluate how a system developer could modify the recommendation algorithm to introduce 'serendipity' or 'novelty' to counter this issue.
Show answer & marking scheme

Worked solution

1. Explanation of the Filter Bubble / Echo Chamber:
- Content-based filtering profiles users based on the features of items they liked. It recommends new items with similar feature vectors.
- This creates a feedback loop: the user only sees similar items, only interacts with those items, which further narrows their profile.
- It limits user exploration, reduces engagement over time (due to boredom), and can isolate users in ideologically homogeneous spaces.

2. Strategies to Introduce Serendipity and Novelty:
- Exploration vs. Exploitation (\(\epsilon\)-greedy): The developer allocates a fraction \(\epsilon\) of the recommendation slots to randomly selected or highly diverse items from distant categories.
- Hybridization: Combining collaborative features allows the system to cross-reference other users' behaviors, finding non-obvious pathways to new content domains.
- Re-ranking with Diversity Metrics (Maximal Marginal Relevance - MMR): MMR balances relevance and diversity by selecting items that are relevant to the query/user profile but dissimilar to items already selected in the top-K recommendation list.
- Graph-based random walks: Utilizing knowledge graphs to navigate to conceptually distant nodes that still share subtle, high-level semantic links.

Marking scheme

Max 7.5 marks:
- Up to 2.5 marks for explaining the mechanics and consequences of the filter bubble / echo chamber in content-based filtering.
- Up to 3.5 marks for evaluating technical strategies to introduce serendipity/novelty (1.5 marks for explaining a diverse re-ranking strategy like MMR, 1.5 marks for an exploration strategy like \(\epsilon\)-greedy, 0.5 marks for hybrid/graph alternatives).
- 1.5 marks for structured and analytical presentation of the tension between relevance and diversity.
Question 4 · Case Study Analysis & Essay Questions
7.5 marks
Analyze the differences in computational complexity and scalability between user-based collaborative filtering and item-based collaborative filtering when deployed in a system with millions of active users and a relatively static catalog of products.
Show answer & marking scheme

Worked solution

1. User-Based Collaborative Filtering Scalability:
- Relies on finding users with similar rating histories (neighbors).
- Computational complexity for the user similarity matrix is \(O(U^2 \cdot d)\) where \(U\) is the number of users and \(d\) is the average number of co-rated items.
- When \(U\) is in the millions, storing and updating an \(O(U^2)\) matrix in memory becomes a critical bottleneck.
- Because users actively browse and rate items, their profiles change dynamically, requiring continuous real-time updates of their similarities to keep recommendations fresh.

2. Item-Based Collaborative Filtering Scalability:
- Relies on finding items that are rated similarly by the same users.
- Computational complexity for the item similarity matrix is \(O(I^2 \cdot u)\) where \(I\) is the number of items and \(u\) is the average number of users who rated an item.
- In e-commerce or streaming catalogs, the number of items is usually far smaller than active users (\(I \ll U\)).
- Item similarities are far more stable than user profiles (e.g., a book's target audience or style does not change instantly). Thus, the item similarity matrix can be computed entirely offline in batch processes (e.g., daily) and stored in a cache/database for instant online lookup, reducing runtime latency to \(O(1)\) relative to user scaling.

Marking scheme

Max 7.5 marks:
- Up to 2.5 marks for analyzing user-based collaborative filtering (addressing user-scale complexity \(O(U^2)\), dynamic user profiles, and computational challenges).
- Up to 2.5 marks for analyzing item-based collaborative filtering (addressing item-scale complexity \(O(I^2)\), catalog stability, and static nature of items).
- Up to 2.5 marks for a comparative evaluation of scalability benefits (explicitly highlighting why \(I \ll U\) makes item-based systems superior for offline precomputation and low-latency online recommendation).

Wondering how well you actually know this?

Thinka is an AI practice app for DSE students — unlimited questions, instant auto-marking, and detailed step-by-step solutions. 100,000+ students use it to confirm they actually know it, not just think they do.

Want more questions like this? Practice unlimited on Thinka — instant answers included.

Start Practising Free