Probability Estimation: Many classifiers do not just output a label but calculate the probability , representing the likelihood that input belongs to class .
Loss Functions: Models are optimized using loss functions like Cross-Entropy Loss, which penalizes the difference between the predicted probability distribution and the actual labels.
The Sigmoid Function: In binary classification, the Sigmoid function is used to map any real-valued number into a probability range between 0 and 1.
Maximum Likelihood Estimation (MLE): This statistical principle is often used to find the parameter values that make the observed data most probable under the chosen model.
| Feature | Binary Classification | Multiclass Classification |
|---|---|---|
| Number of Classes | Exactly two (e.g., Yes/No) | Three or more (e.g., Red/Blue/Green) |
| Output Function | Sigmoid | Softmax |
| Complexity | Lower; single decision boundary | Higher; multiple boundaries or one-vs-all |
Generative vs. Discriminative: Generative models (like Naive Bayes) model the distribution of individual classes, while discriminative models (like Logistic Regression) learn the boundary directly between classes.
Parametric vs. Non-parametric: Parametric models (Logistic Regression) assume a specific functional form for the boundary, whereas non-parametric models (k-NN) grow in complexity with the size of the data.
Accuracy: The ratio of correctly predicted observations to the total observations; however, it can be misleading if classes are highly imbalanced.
Precision and Recall: Precision measures the accuracy of positive predictions (avoiding false positives), while Recall (Sensitivity) measures the ability to find all positive instances (avoiding false negatives).
F1-Score: The harmonic mean of Precision and Recall, providing a single metric that balances both concerns, especially useful when class distributions are uneven.
Confusion Matrix: A table used to describe the performance of a classification model by showing the counts of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Check for Class Imbalance: Always look at the distribution of labels; if 99% of data is Class A, a model with 99% accuracy might just be predicting Class A every time without learning anything.
Verify Feature Scaling: For distance-based algorithms like k-NN or SVM, ensure features are scaled (e.g., normalized) so that features with larger numerical ranges do not dominate the distance calculation.
Analyze the Confusion Matrix: When asked to evaluate a model, don't just look at accuracy; identify if the model is failing specifically on one class or confusing two specific categories.
Select the Right 'k': In k-NN, remember that a small (like ) makes the model sensitive to noise (overfitting), while a very large makes the boundaries too smooth (underfitting).