Unit 3 | Notion

Syllabus →

Screenshot 2024-03-23 at 10.24.17 AM.png

Screenshot 2024-05-25 at 2.50.55 AM.png

CT PYQ →

Screenshot 2024-03-26 at 11.10.00 AM.png

Pruning Strategies for Data Mining

Pruning in data mining refers to the process of reducing the size of a dataset or model by removing unnecessary or less significant parts. This helps improve efficiency, reduce complexity, and enhance the performance of data mining algorithms. Here are some common pruning strategies:

1. Pre-pruning (Early Stopping):

Description: Pre-pruning involves stopping the data mining algorithm before it becomes too complex, based on certain criteria. This is often used in decision tree algorithms.

Example:

In decision tree construction, the tree growth can be halted if:
- The number of instances in a node is below a threshold.
- The depth of the tree exceeds a predefined limit.
- Further splitting does not significantly improve the model accuracy.

Advantages:

Reduces overfitting.
Lowers computational cost.

Disadvantages:

May lead to underfitting if the stopping criteria are too stringent.

2. Post-pruning:

Description: Post-pruning involves first allowing the algorithm to create a fully grown model, and then pruning back certain parts to reduce complexity.