Publication Details
Keywords: Explainable deep learning, model transparency, feature attribution, faithfulness optimization, convolutional neural networks, interpretability framework, trustworthy machine learning.
Abstract
Deep neural networks make decisions about industrial fault detection, clinical screening, and credit adjudication when mistakes are expensive. Although deep models are accurate, practitioners are left with a system that works well but is unable to explain itself because the reasoning behind each prediction is concealed. In order to reconcile the divergent outputs of three complementary attribution engines—gradient integration, Shapley-based perturbation, and class-discriminative activation mapping—with fidelity-driven optimisation, this paper proposes a unified explainability framework. This design weighs attribution sources according to model behaviour and incorporates explanation generation into the inference pipeline. We evaluate the framework using four attribution techniques on structured tabular records, natural images, and medical radiographs. This framework slightly increases inference latency while improving explanation faithfulness by 8.4% compared to the strongest single-method baseline. 95.8% is the average classification accuracy. For stakeholders who require consistency, numerical improvements and more stable aggregated explanations across runs are crucial. Trade-offs in implementation engineering and practical deployment routes are explored. Explanations are more dependable and repeatable than any constituent technique alone when attribution methods are assembled under a faithfulness objective.