Bit Depth and Capacity: The capacity of a character set is exponentially related to its bit depth. A 7-bit system provides unique codes, while an 8-bit system provides codes.
Logical Ordering: Character sets are typically organized sequentially. For example, if 'A' is represented by , 'B' will be , and 'C' will be . This allows for efficient sorting and mathematical comparison of text data.
Control Characters: Not all codes represent visible symbols; the first 32 codes in ASCII are reserved for 'control characters' like 'Line Feed' or 'Escape', which manage hardware behavior.
ASCII (American Standard Code for Information Interchange): Originally a 7-bit standard designed for English text. It includes uppercase and lowercase letters, digits , and basic punctuation.
Extended ASCII: An 8-bit evolution that adds 128 additional slots (total 256). These are used for mathematical symbols, graphical characters, and accented letters for Western European languages.
Unicode: A universal standard designed to represent every character from every language, including ancient scripts and modern emojis. It typically uses 16 bits ( combinations) or more (e.g., UTF-8, which uses variable lengths).
Storage Calculation: To find the file size of a text string, multiply the number of characters by the bits per character. For example, a 10-character string in 8-bit ASCII requires bits ( bytes).
The Rule: Always use this formula to determine how many characters can be represented. If an exam asks how many bits are needed for 500 characters, you must find the smallest where (which is ).
Case Sensitivity: Remember that 'A' and 'a' have different binary codes. In ASCII, uppercase letters always have lower denary values than their lowercase counterparts.
Sequential Logic: If you are given the code for one letter, you can find the code for the next by adding 1. Do not waste time memorizing the whole table; just know the starting points (e.g., 'A' = 65, 'a' = 97).
Units Check: Pay attention to whether the question asks for the answer in bits or bytes. Divide bits by 8 to get bytes.
Confusing Bits and Characters: Students often mistake the number of bits for the number of characters. 8 bits does NOT mean 8 characters; it means 1 character represented by 8 bits, allowing for 256 possible variations.
ASCII and Emojis: A common error is assuming ASCII can store emojis. ASCII is strictly limited to basic text; only Unicode has the capacity for complex symbols and emojis.
Fixed vs. Variable Width: While ASCII is always fixed-width (e.g., 8 bits), some Unicode encodings like UTF-8 are variable-width, meaning different characters can take up different amounts of space.
Data Compression: Understanding character encoding is vital for text compression (like Huffman coding), which reduces file size by using fewer bits for frequently occurring characters.
Network Protocols: When two computers communicate, they must agree on the character set (the 'encoding') in the header of the data packet, or the text will appear as 'mojibake' (garbled symbols).