Admin Panel AccessUser ManagementSystem SettingsExport DatabaseDownload BackupUser Credentials ListAPI Keys ManagementAccess TokensEnvironment ConfigConfiguration FileWordPress AdminWordPress LoginphpMyAdminJoomla AdminEnvironment FileGit ConfigDatabase BackupDebug InterfaceTest EndpointInternal API
LearnlyAILibraryPodcasts
DashboardMy ShelfAll NotesAI ChatCreate AI NoteEssay AssistantAI PresentationTo-DoCalendar
Courses

Log in to view your courses

Referral & Rewards
Revision Notes
AS-Level
Cambridge International Examinations
Maths
Probability And Statistics 1
Data Presentation & Interpretation
Coding Data
AI Assistant

Coding Data

Summary

Coding is a statistical technique used to simplify complex datasets by applying linear transformations. By adding, subtracting, multiplying, or dividing all values in a set by a constant, researchers can work with more manageable numbers without losing the underlying statistical relationships. Understanding how these transformations affect measures of location (like the mean) versus measures of spread (like standard deviation) is fundamental to accurate data analysis.

1. Definition & Core Concepts

Coding is the process of transforming a dataset using a mathematical formula to make calculations easier or to standardize data for comparison.

The most common form of coding is a linear transformation, expressed by the formula y=ax+by = ax + by=ax+b, where xxx represents the original data and yyy represents the coded data.

An assumed mean is a specific type of coding where a constant value is subtracted from every data point to center the data around zero or a smaller number, simplifying the calculation of the actual mean.

Original Data (x)Coded Data (x + b)Shift (Addition/Subtraction)

Diagram showing a distribution curve shifting along the x-axis due to additive coding, illustrating that the shape and spread remain identical while the location changes.

2. Impact on Measures of Location

3. Impact on Measures of Spread

4. Key Distinctions

5. Summary Statistics with Assumed Mean

6. Exam Strategy & Tips

Measures of location, such as the mean, median, and mode, are directly affected by every part of the coding formula.

If the data is coded as y=ax+by = ax + by=ax+b, the new mean yˉ\bar{y}yˉ​ is calculated by applying the same operations to the original mean: yˉ=axˉ+b\bar{y} = a\bar{x} + byˉ​=axˉ+b.

To retrieve the original mean from coded results, you must reverse the operations in the opposite order: xˉ=yˉ−ba\bar{x} = \frac{\bar{y} - b}{a}xˉ=ayˉ​−b​.

Measures of spread, such as standard deviation and range, describe the distance between data points and are only affected by multiplication or division.

Adding or subtracting a constant (bbb) does not change the spread because every point moves by the same amount, keeping the distances between them identical.

When multiplying by a constant (aaa), the standard deviation scales by the modulus (absolute value) of that constant: σy=∣a∣σx\sigma_y = |a|\sigma_xσy​=∣a∣σx​.

The variance is affected by the square of the multiplier, meaning σy2=a2σx2\sigma_y^2 = a^2 \sigma_x^2σy2​=a2σx2​.

It is vital to distinguish between how location and spread respond to additive versus multiplicative changes.

Statistic Addition/Subtraction (+b+b+b) Multiplication/Division (×a\times a×a)
Mean / Median Changes by +b+b+b Changes by ×a\times a×a
Std. Deviation No Change Changes by $
Variance No Change Changes by a2a^2a2

When using an assumed mean aaa, the coded data is often represented as (x−a)(x - a)(x−a). The sum of this coded data is ∑(x−a)\sum(x - a)∑(x−a).

The variance of the original data can be calculated directly from the coded summary statistics: Var(x)=∑(x−a)2n−(∑(x−a)n)2Var(x) = \frac{\sum(x-a)^2}{n} - (\frac{\sum(x-a)}{n})^2Var(x)=n∑(x−a)2​−(n∑(x−a)​)2.

Note that because subtraction does not affect spread, the variance of (x−a)(x-a)(x−a) is exactly the same as the variance of xxx.

  • Check the Multiplier: Always use the absolute value of the multiplier when adjusting the standard deviation; spread can never be negative.

  • Reverse Operations: When finding the original mean, ensure you solve the equation yˉ=axˉ+b\bar{y} = a\bar{x} + byˉ​=axˉ+b for xˉ\bar{x}xˉ correctly by subtracting bbb before dividing by aaa.

  • Units Matter: Remember that standard deviation maintains the original units, while variance uses squared units. Coding must respect these dimensions.

  • Sanity Check: If you add 10 to every score in a test, the average should go up by 10, but the 'gap' between the highest and lowest student should remain the same.