Paper accepted in IEEE Transactions on Computational Social Systems Journal

Title: WELFake: Word Embedding over Linguistic Features for Fake News Detection

Authors: Pawan Kumar Verma (Lovely Professional University, India | GLA University, India), Prateek Agrawal (University of Klagenfurt, Austria | Lovely Professional University, India), Ivone Amorin (MOG Technologies | University of Porto, Portugal), Radu Prodan (University of Klagenfurt, Austria)

Abstract: Social media is a popular medium for dissemination of real-time news all over the world. Easy and quick information proliferation is one of the reasons for its popularity. An extensive number of users with different age groups, gender and societal beliefs are engaged in social media websites. Despite these favorable aspects, a significant disadvantage comes in the form of fake news, as people usually read and share information without caring about its genuineness. Therefore, it is imperative to research methods for the authentication of news. To address this issue, this paper proposes a two phase benchmark model named WELFake based on word embedding (WE) over linguistic features for fake news detection using machine learning classification. The first phase pre-processes the dataset and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE and applies voting classification. To validate its approach, this paper also carefully designs a novel WELFake dataset with approximately 72,000 articles, which incorporates different datasets to generate an unbiased classification output. Experimental results show that the WELFake model categorises the news in real and fake with a 96.73% which improves the overall accuracy by 1.31% compared to BERT and 4.25% compared to CNN models. Our frequency-based and focused analyzing writing patterns model outperforms predictive-based related works implemented using the Word2vec WE method by up to 1.73%.

Acknowledgement: ARTICONF project