A multi-criteria scoring metric for evaluating deep learning models in bitcoin price forecasting
Main Article Content
Abstract
The increasing computational demands of deep learning have raised concerns about the environmental sustainability of artificial intelligence applications, particularly in high-frequency domains such as financial forecasting. This paper addresses the need for more holistic evaluation criteria by proposing a multi-criteria scoring metric for deep learning models used in Bitcoin price forecasting. The purpose of the study is to develop a performance metric that balances predictive accuracy with computational efficiency and environmental impact. The method involves combining traditional accuracy measures with training time, energy consumption, and carbon emissions into a unified performance score, calculated using a logistic scoring function. The metric was validated by applying it to forty-two configurations of Long Short-Term Memory models trained on historical Bitcoin price data. Each configuration was assessed for its forecasting accuracy, energy use and emissions (measured using a carbon-tracking tool). The results show that simpler Long Short-Term Memory models can offer competitive accuracy while significantly reducing training time and emissions. The highest-performing model achieved a balance of all criteria, while deeper architectures with marginal accuracy gains incurred disproportionate environmental costs. The study concludes that the proposed scoring metric offers a practical and scalable solution for selecting deep learning models under sustainability constraints, supporting more responsible Artificial Intelligence deployment in real-world settings.