URL-Based Phishing Detection Using a BERT-LSTM Model

Published in Journal of Information Systems and Informatics (ISI), 2026

Abstract: The rising prevalence of phishing websites presents substantial cybersecurity threats by deceiving users into revealing sensitive information through malicious URLs. This study aims to enhance phishing URL detection by introducing a deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) with Long Short-Term Memory (LSTM). In this framework, BERT is fine-tuned on a phishing URL dataset and utilized as a contextual embedding to represent URL tokens, while Bayesian Optimization is employed to identify optimal hyperparameter settings during model training. Experimental results demonstrate that the BERT-LSTM model achieves impressive detection performance, with a precision of 0.9299, recall of 0.9795, F1-score of 0.9540, accuracy of 0.9756, and ROC-AUC of 0.9962. The model consistently outperforms embedding-based methods such as Word2Vec, FastText, and GloVe, as well as a classical baseline model using Logistic Regression with TF-IDF features. These findings suggest that the contextual embeddings generated by BERT effectively capture structural patterns in URLs, leading to more accurate phishing detection and providing a promising approach for enhancing cybersecurity systems.

Recommended citation: H. S. Wicaksana, U. Ependi, and A. Muzakir, “URL-Based Phishing Detection Using a BERT-LSTM Model”, Journal of Information Systems and Informatics (ISI), vol. 8, no. 1, pp. 1344–1367, Apr. 2026, doi: 10.63158/journalisi.v8i1.1543.
Download Paper

Share on

X (formerly Twitter) Facebook LinkedIn

Hilman Singgih Wicaksana

Share on