Syllabus →

Screenshot 2024-02-12 at 12.10.26 AM.png


Screenshot 2024-05-25 at 2.50.05 AM.png

Endsem PYQ →

Screenshot 2024-02-12 at 10.29.41 AM.png

KDD stands for Knowledge Discovery in Databases. It is an interdisciplinary field that involves the extraction of useful patterns, insights, and knowledge from large datasets. KDD encompasses a series of stages or steps that collectively form the knowledge discovery process. The various stages of KDD typically include:

  1. Understanding the Domain and Goals: In this initial stage, the domain experts work closely with data scientists to understand the domain-specific objectives, requirements, and constraints of the knowledge discovery process. This involves defining the problem statement, specifying the goals of the analysis, and identifying the relevant data sources.
  2. Data Selection: Once the objectives are defined, the next stage involves selecting the appropriate datasets for analysis. This may involve collecting data from various sources, such as databases, data warehouses, external repositories, or the internet. The selection of data depends on factors such as relevance, quality, completeness, and availability.
  3. Data Preprocessing: Data preprocessing is a critical stage in the KDD process, involving cleaning, transforming, and preparing the raw data for analysis. This may include tasks such as data cleaning (removing noise, errors, and inconsistencies), data integration (merging data from multiple sources), data transformation (converting data into a suitable format), and data reduction (reducing the dimensionality of the dataset).
  4. Data Mining: The data mining stage involves applying various data mining techniques and algorithms to the preprocessed data to extract patterns, insights, and knowledge. This may include techniques such as classification, clustering, regression, association rule mining, anomaly detection, and sequence mining, depending on the objectives of the analysis.
  5. Pattern Evaluation: Once patterns are discovered using data mining techniques, they need to be evaluated for their quality, significance, and usefulness. This involves assessing the patterns against domain-specific criteria, validating them using statistical measures or validation techniques, and interpreting the results to extract actionable insights.
  6. Knowledge Presentation: The knowledge presentation stage involves communicating the results of the analysis to stakeholders in a clear, understandable, and actionable format. This may involve visualizations, reports, dashboards, summaries, or presentations that convey the insights and implications of the analysis to decision-makers, domain experts, or end-users.
  7. Knowledge Utilization: The final stage of the KDD process involves utilizing the extracted knowledge to make informed decisions, drive actions, and achieve the objectives defined at the beginning of the process. This may involve integrating the knowledge into business processes, systems, or applications to improve decision-making, optimize operations, or generate value for the organization.

Overall, the KDD process is an iterative and interdisciplinary process that involves collaboration between domain experts, data scientists, and stakeholders to extract actionable knowledge from data and drive informed decision-making in various domains and industries.

Screenshot 2024-03-23 at 10.18.34 AM.png

(i) Handling Negative and Missing Values in the Dataset

To handle the negative and missing values in the given shopping mall customers' details dataset, we will follow these steps:

Identifying Issues