LScDC Word Clouds and Tables to Visually Present the Most Informative Words in Subject Categories
figureposted on 24.04.2020, 13:56 by Neslihan SuzenNeslihan Suzen
Word Clouds to Visually Present the Most Informative Words in Subject Categories
April 2020 by Neslihan Suzen, PhD student at the University of Leicester (firstname.lastname@example.org / email@example.com )
Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes
This publication presents word clouds of the most informative words in Web of Science (WoS) categories [1,2]. The clouds are created with words of the Leicester Scientific Dictionary-Core LScDC [3,4]. We consider the list of words with their Relative Information Gain (RIGs) in the corresponding category. For all categories, words are sorted by their RIGs in descending order and the top 100 words are shown in the word clouds . The bigger size the word in word clouds, the more informative it is for the category. This study is a part of the research on the quantification of the meaning of research texts.
Word clouds for the top 100 most informative words and histograms of RIGs for the top 10 most informative words for each of 252 categories can be found in the archive published along with this description. The most informative 100 words with their RIGs for each of categories are presented in tables published.
Published archive contains following files:
1. Word_Clouds.pdf: A file that contains all word clouds of the top 100 most informative words and the histogram of the top 10 most informative words for each of 252 WoS categories.
2. Lists_of_Words .pdf: Lists of the top informative 100 words for each of 252 WoS categories.
 Web of Science. (15 July). Available: https://apps.webofknowledge.com/
 WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html
 Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3
 Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858.
 Suzen, Neslihan (2020): LScDC Word-Category RIG Matrix. figshare. Dataset. https://doi.org/10.25392/leicester.data.12133431.v1
data mining techniquesNatural language processingtext miningtext mining algorithmWord cloudtext dataArtifical Intelligencetext representationInformation Gain (IG)entropybasedmachine learningthesaurusesInformation Retrieval and Natural Language Processing applicationinformation ExtractionData processingdictionaryR programmingText Data MiningDigital Libraries