Unit 5 | Notion

Syllabus →

Screenshot 2024-03-23 at 10.24.50 AM.png

Screenshot 2024-05-25 at 2.51.39 AM.png

Endsem pyqs →

Screenshot 2024-04-29 at 12.06.00 PM.png

Screenshot 2024-04-29 at 12.06.29 PM.png

Screenshot 2024-04-29 at 12.07.10 PM.png

Screenshot 2024-04-29 at 12.07.21 PM.png

Personal Notes →

Data Mining: Outlier Detection

Screenshot 2024-05-25 at 5.13.05 AM.png

In data mining, an outlier is an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism. Detecting outliers is critical in many fields like fraud detection, network security, and fault detection.

Challenges of Outlier Detection

Definition of Normality: Establishing a clear definition of what constitutes normal behavior or data is challenging, as it varies from dataset to dataset.
High Dimensional Spaces: Outlier detection becomes increasingly difficult as the dimensionality of the data increases. This is due to the "curse of dimensionality" where the data becomes sparse, and traditional distance measures become less meaningful.
Type I and II Errors: Distinguishing between true outliers and noise is difficult and can result in false positives (Type I errors) or false negatives (Type II errors), where outliers are missed.
Scalability: With large datasets, efficiently finding outliers becomes computationally expensive. Scalable algorithms are necessary to handle big data scenarios.