What is the fundamental difference between lossy and lossless compression regarding the original data?

Lossy compression permanently removes data to reduce size and is irreversible, whereas lossless compression only re-encodes data so it can be perfectly reconstructed to its original state.

Why is lossy compression unsuitable for a computer program or a text document?

Computer programs and text require $100\%$ accuracy to function or be read correctly; losing even a single bit of data in these formats would result in errors, corruption, or loss of meaning.

How does the file size reduction of lossy compression compare to lossless compression?

Lossy compression typically achieves much higher compression ratios and significantly smaller file sizes because it can discard large amounts of 'unnecessary' data, while lossless is limited by the actual patterns in the data.

What is a common mistake when applying Run-Length Encoding (RLE) to highly varied data?

If the data has no repeating consecutive elements, RLE can actually increase the file size because it adds a 'count' value for every single unique element (e.g., 'ABC' becomes '1A1B1C').

What happens to the original file when a JPEG compression algorithm is applied?

A new compressed file is created and the original uncompressed data is lost; because the process is irreversible, you cannot get the original high-quality bits back from the JPEG.

Why might a user be disappointed if they compress a file multiple times using lossy methods?

Each time lossy compression is applied, more data is discarded, leading to 'generation loss' where the quality of the image or audio noticeably degrades further with every save.

Define 'Perceptual Music Shaping' in the context of MP3 compression.

It is a lossy compression technique that removes audio frequencies humans cannot hear and quiet sounds that are masked by louder ones to reduce file size without a perceived loss in quality.

What is Run-Length Encoding (RLE)?

RLE is a lossless compression method that simplifies data by representing consecutive identical elements as a single data value paired with a count of its repetitions.

What is the purpose of a 'count' in an RLE binary representation?

The count is a fixed-size binary integer that tells the decompressor how many times the following data value should be repeated in the reconstructed file.

How does compression facilitate the streaming of high-quality video over the internet?

By reducing the file size, compression lowers the required bandwidth, allowing the data to be transmitted fast enough to play in real-time without constant buffering.

Revision Notes

AS-Level

Cambridge International Examinations

Computer Science

1. Information representation

Compression

Data Compression

Summary

Data compression is the process of encoding information using fewer bits than the original representation to reduce file size. This optimization is essential for maximizing storage capacity on secondary devices and minimizing the time required to transfer data across networks. Compression is broadly categorized into lossy and lossless methods, each suited for different data types based on the tolerance for quality degradation.

1. Definition & Core Concepts

Compression is the reduction of file size to optimize the use of secondary storage and improve network transmission speeds. By reducing the number of bits required to represent data, files take up less space and can be uploaded or downloaded more efficiently.

The process involves using specific algorithms to identify and remove redundancy or unnecessary detail from the data. The effectiveness of compression is often measured by the compression ratio, which compares the size of the original file to the compressed version.

Compression is vital in modern computing for streaming high-definition media, storing large databases on mobile devices, and ensuring fast web page loading times.

A diagram showing the transformation of a large original data block into a smaller compressed block via a compression algorithm.

2. Lossy Compression

Lossy compression achieves significant size reduction by permanently removing data that is deemed less important or unnoticeable to human perception. Because data is discarded, the process is irreversible, meaning the original file cannot be perfectly reconstructed from the compressed version.

In audio files like MP3, lossy compression uses perceptual music shaping. This technique removes frequencies that are outside the range of human hearing or sounds that are 'masked' (hidden) by louder, simultaneous sounds.

In visual media like JPEG images, the algorithm groups similar colors together or reduces color depth. While this results in a loss of quality (artifacts), it is often acceptable for photographs and videos where the human eye may not detect minor discrepancies.

3. Lossless Compression

Lossless compression reduces file size without losing any information, ensuring that the original data can be reconstructed bit-for-bit upon decompression. This is achieved by identifying and encoding patterns and repetitions within the data more efficiently.

This method is mandatory for files where any data loss would render the file useless, such as text documents, executable programs, or source code. If a single character in a legal document or a single instruction in a program were lost, the file would be corrupted.

Common formats include PNG for images and ZIP for general file archiving. While lossless compression is safer, it typically results in much larger file sizes compared to lossy methods.

4. Run-Length Encoding (RLE)

5. Key Distinctions

6. Exam Strategy & Tips

Data Compression

Summary

1. Definition & Core Concepts

Compression is vital in modern computing for streaming high-definition media, storing large databases on mobile devices, and ensuring fast web page loading times.

A diagram showing the transformation of a large original data block into a smaller compressed block via a compression algorithm.

2. Lossy Compression

3. Lossless Compression

Common formats include PNG for images and ZIP for general file archiving. While lossless compression is safer, it typically results in much larger file sizes compared to lossy methods.

4. Run-Length Encoding (RLE)

Run-Length Encoding (RLE) is a simple lossless compression technique that replaces sequences of identical data elements (runs) with a single data value and a count of how many times it repeats. For example, the string 'AAAAABBB' would be stored as '5A3B'.

In binary RLE, the representation consists of a fixed-size binary count followed by the data value. If using 8 bits for the count, the number 4 would be stored as $00000100$ , followed by the ASCII or binary code for the character or pixel color.

RLE is highly effective for data with many consecutive identical values, such as simple icons, line drawings, or documents with large areas of white space, but it can actually increase file size if the data has no repeating patterns.

5. Key Distinctions

Choosing between lossy and lossless compression depends on the required fidelity and the available storage or bandwidth.

Feature	Lossy Compression	Lossless Compression
Reversibility	Irreversible (Data lost)	Reversible (Original restored)
File Size	Very small	Moderately reduced
Quality	Reduced (Degradation)	Identical to original
Use Case	Photos, Video, Audio	Text, Code, Spreadsheets

6. Exam Strategy & Tips

Identify the Data Type: If an exam question asks which compression to use for a spreadsheet or a program, always choose lossless. If it involves streaming video or background music, lossy is usually the correct answer.
RLE Calculations: When calculating RLE savings, remember to account for the size of the 'count' byte. If the original data is 100 bits and the RLE version uses 20 bits, the saving is $100 - 20 = 80$ bits (an $80\%$ reduction).
Terminology Precision: Use terms like 'perceptual shaping' for audio and 'irreversible' for lossy compression to gain full marks in descriptive questions.
Check for Patterns: Before suggesting RLE, look at the data. If the data is 'ABCABC', RLE would result in '1A1B1C1A1B1C', which is longer than the original; in this case, RLE is inefficient.