Syllabus →

Screenshot 2024-03-23 at 10.24.31 AM.png


Screenshot 2024-05-25 at 2.51.04 AM.png


Endsem pyqs →

Screenshot 2024-04-29 at 12.03.11 PM.png

Screenshot 2024-04-29 at 12.04.48 PM.png

Screenshot 2024-04-29 at 12.05.17 PM.png


Personal Notes →

Cluster Analysis →

Cluster analysis in Data Mining and Analysis (DMA) is a technique used to group a set of objects in such a way that objects in the same group (or cluster) are more similar to each other than to those in other groups. It's a form of unsupervised learning, which means it does not rely on predefined classes and labels. Here are some key points about cluster analysis in the context of DMA:

  1. Objective: The main goal is to discover the inherent groupings in the data, such as grouping customers by purchasing behavior or grouping documents by similar topics.
  2. Methods: There are several clustering algorithms, each with its own method and application suitability. Common ones include:
  3. Distance Measures: The choice of distance metrics can significantly affect the outcome of the clustering. Commonly used metrics include Euclidean distance, Manhattan distance, and cosine similarity.
  4. Applications: Cluster analysis is widely used in various fields such as marketing (to segment customers), biology (to group genes with similar expression patterns), and document clustering for information organization.
  5. Evaluation: Evaluating the quality of the clustering results can be challenging, especially since the ground truth labels are not known. Techniques like the Silhouette Score, Dunn index, or comparing internal vs. external measures can be used to assess clustering performance.
  6. Challenges: Some challenges include choosing the right number of clusters, handling different data types, scaling with large datasets, and the sensitivity to the initial settings in some algorithms.