Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

Christian H., Suhartono D., Chowanda A., Zamli K.Z.

Computer Science Department, BINUS Graduate Program, Master of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia; Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia; Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Pahang, 26600, Malaysia


The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset. © 2021, The Author(s).

BERT; Deep learning; Language model; Natural language processing; Personality prediction; Social media


Journal of Big Data

Publisher: Springer Science and Business Media Deutschland GmbH

Volume 8, Issue 1, Art No 68, Page – , Page Count

Journal Link: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85106331051&doi=10.1186%2fs40537-021-00459-1&partnerID=40&md5=8e95992e6c4675192a325e7183f4d878

doi: 10.1186/s40537-021-00459-1

Issn: 21961115



Abood, N., Big five traits: a critical review (2019) Gadjah Mada Int J Business, 21 (2), pp. 159-186; Acheampong, F.A., Nunoo-Mensah, H., Chen, W., Transformer models for text-based emotion detection: a review of BERT-based approaches (2021) Artif Intell Rev; Adi, G.Y.N.N., Tandio, M.H., Ong, V., Suhartono, D., Optimization for automatic personality recognition on Twitter in Bahasa Indonesia (2018) Procedia Comp Sci, 135, pp. 473-480; Alam, F., Stepanov, E.A., Riccardi, G., (2013), pp. 6-9. , Personality traits recognition on social network—Facebook. AAAI Workshop—Technical Report, WS-13-01; Personality prediction based on content of facebook users: A literature review. Proceedings – 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (2019) SNPD, 2019, pp. 34-38. , https://doi.org/10.1109/SNPD.2019.8935692; Ben-Porat, O., Hirsch, S., Kuchy, L., Elad, G., Reichart, R., Tennenholtz, M., Predicting strategic behavior from free text (2020) J Artif Intell Res, 68, pp. 413-445; Bin Tareaf, R., Berger, P., Hennig, P., Meinel, C., Cross-platform personality exploration system for online social networks: Facebook vs (2020) Twitter Web Intell, 18 (1), pp. 35-51; Carvalho, F., Guedesa, G.P., (2020) TF-IDFC-RF: A novel supervised term weighting scheme; Christian, H., Agus, M.P., Suhartono, D., Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF) (2016) ComTech Comp Math Eng Appl., 7 (4), p. 285; Cui, B., Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction, , http://cs229.stanford.edu/proj2017/final-reports/5242471.pdf; Dalvi-Esfahani, M., Niknafs, A., Alaedini, Z., Barati Ahmadabadi, H., Kuss, D.J., Ramayah, T., Social Media Addiction and Empathy: Moderating impact of personality traits among high school students (2020) Telematics Inform; Dandannavar, P.S., Mangalwede, S.R., Kulkarni, P.M., Social media text—a source for personality prediction (2018) Proc Int Conference Comput Tech Electronics Mech Syst CTEMS, 2018, pp. 62-65; Devlin, J., Chang, M.W., Lee, K., Toutanova, K.B.E.R.T., Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference, 1(Mlm), 2019. pp. 4171–4186; Ergu, İ., (2019) Predicting Personality with Twitter Data and Machine Learning Models, p. 1. , Twitter Verisi ve Makine Ö ğ renmesi Modelleriyle Ki ş ilik Tahminleme; Farnadi, G., Sushmita, S., Sitaraman, G., Ton, N., de Cock, M., Davalos, S.A., Multivariate regression approach to personality impression recognition of vloggers. WCPR 2014 – Proceedings of the 2014 Workshop on Computational Personality Recognition (2014) Workshop of MM, 2014, pp. 1-6. , https://doi.org/10.1145/2659522.2659526; Han, S., Huang, H., Tang, Y., Knowledge of words: An interpretable approach for personality recognition from social media (2020) Knowl-Based Syst, 194, p. 105550; Predicting MBTI from Text; Howlader, P., Pal, K.K., Cuzzocrea, A., Kumar, S.D.M., Predicting facebook-users’ personality based on status and linguistic features via flexible regression analysis techniques (2018) Proc ACM Symposium Appl Comput; Jeremy, N.H., Prasetyo, C., Suhartono, D., Identifying personality traits for Indonesian user from twitter dataset (2019) Int J Fuzzy Logic Intell Syst, 19 (4), pp. 283-289; Jiang, H., Zhang, X., Choi, J.D., (2019) Automatic Text-Based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings, pp. 2-4. , ArXiv; Ju, C., Laan, M.J., Van Der (n.d.). The relative performance of ensemble methods with deep convolutional neural networks for image classification. pp; Kazameini, A., Fatehi, S., Mehta, Y., Eetemadi, S., Cambria, E., Computational, G., Unit, N., (2020) Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles, pp. 1-4; Keh, S.S., Cheng, I.-T., (2019) Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-Trained Language Models, , http://arxiv.org/abs/1907.06333, July; Khurana, D., Koli, A., Khatter, K., Singh, S., Natural Language Processing: State of The Art, Current Trends and Challenges Natural Language Processing: State of The Art, Current Trends and Challenges Department of Computer Science and Engineering Manav Rachna International University, Faridabad-. ArXiv Preprint ArXiv, August 2017. 2018; Kircaburun, K., Alhabash, S., Tosuntaş, Ş.B., Griffiths, M.D., Uses and gratifications of problematic social media use among university students: a simultaneous examination of the big five of personality traits, social media platforms, and social media use motives (2020) Int J Ment Heal Addict, 18 (3), pp. 525-547; Lim, H.S., Bouchacourt, L., Brown-Devlin, N., Nonprofit organization advertising on social media: the role of personality, advertizing appeals, and bandwagon effects (2020) J Consumer Behav.; Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Zettlemoyer, L., (2019) Stoyanov V, p. 1. , RoBERTa, A robustly optimized BERT pretraining approach. ArXiv; Lynn, V.E., Balasubramanian, N., Schwartz, H.A., (2020) Hierarchical modeling for user personality prediction: The role of message-level attention, pp. 5306-5316; Marouf, A.A., Hasan, M.K., Mahmud, H., Comparative analysis of feature selection algorithms for computational personality prediction from social media (2020) IEEE Trans Comput Social Syst, 7 (3), pp. 587-599; Maslej-kreš, V., Sarnovský, M., Butka, P., Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification (2020) Appl Sci; Ong, V., Rahmanto, A.D.S., Williem, W., Suhartono, D., Nugroho, A.E., Andangsari, E.W., Suprayogi, M.N., Personality prediction based on Twitter information in Bahasa Indonesia. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (2017) Fedcsis, 2017 (11), pp. 367-372. , https://doi.org/10.15439/2017F359; Ong, V., Rahmanto, A.D.S., Williem, & Suhartono,, D., Exploring personality prediction from text on social media: a literature review (2017) Internetworking Indonesia J, 9 (1), pp. 65-70; Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.T., Dissecting contextual word embeddings: Architecture and representation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2020) EMNLP, 2018, pp. 1499-1509. , https://doi.org/10.18653/v1/d18-1179; Pratama, B.Y., Sarno R. Personality classification based on Twitter text using Naive Bayes, KNN and SVM. Proceedings of 2015 International Conference on Data and Software Engineering (2016) ICODSE, 2015, pp. 170-174. , https://doi.org/10.1109/ICODSE.2015.7436992; Redhu, S., Sentiment analysis using text mining: a review (2018) Int J Data Sci Technol, 4 (2), p. 49; Tadesse, M.M., Lin, H., Xu, B., Yang, L., Personality predictions based on user behavior on the Facebook social media platform (2018) IEEE Access, 6 (2016), pp. 61959-61969; Tandera, T., Hendro, S., Personality prediction system from facebook users (2017) Procedia Comp Sci, 116, pp. 604-611; Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., Attention is all you need (2017) Advances in Neural Information Processing Systems, pp. 5999-6009; Violino, B., Social media trends. Association for Computing Machinery (2020) Commun ACM, 54 (2), p. 17; Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V., (2019) Xlnet: Generalized Autoregressive Pretraining for Language Understanding, pp. 1-18. , ArXiv, NeurIPS; Yuan, C., Wu, J., Li, H., Wang, L., Personality recognition based on user generated content. 2018 15th International Conference on Service Systems and Service Management (2018) ICSSSM, 2018, pp. 1-6. , https://doi.org/10.1109/ICSSSM.2018.8465006; Zheng, H., Wu, C., Predicting personality using facebook status based on semi-supervised learning (2019) ACM Int Conference Proc Series, Part, F1481, pp. 59-64

Indexed by Scopus

Leave a Comment