ICNLSP 2019 Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 11–12 September, 2019 University of Trento Trento, Italy c⃝2019 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-952148-53-8 Introduction Welcome to NSURL2019, the First International Workshop on NLP Solutions for Under Resourced Lan- guages (NSURL 2019) co-located with ICNLSP 2019, held on September 11th, 12th 2019, at the Univer- sity of Trento in Italy. NSURL is an opportunity and a forum for researchers and students to exchange ideas and discuss research and trends in the field of Natural Language Processing and Speech Processing. 26 papers have been submitted to NSURL 2019. 19 of them have been accepted. All the papers have been presented orally. The workshop, indeed, has been an interesting forum for solving NLP problems for low-resourced languages. The attendance beneficiated from the two keynotes presented at ICNLSP 2019. The first one, entitled "De- tecting the fake news before they were even written", presented by Dr. Preslav Nakov from Qatar Com- puting Research Institute (QCRI), Qatar. The second keynote "One world - seven thousand languages" presented by Prof. Fausto Giunchiglia from University of Trento, Italy. We would like to acknowledge the support provided by University of Trento and Data-Scientia. We would like also to express our gratitude to the organizing and the program committees for the hard and valuable contributions. Abed Alhakim Freihat, and Mourad Abbas Trento, September 2019 iii Organizers: Chair: Dr. Abed Alhakim Freihat Co-Chair: Dr. Mourad Abbas Program Committee: Mourad Abbas, CRSTDLA, Algeria Ahmed AbuRa’ed, Universitat P. F. Barcelona, Spain Abdallah Abushmaes, Mawdoo3 Ltd, Jordan Abdulmohsen Althubaity, The NCAIBD Research Center-KACST, KSA Violetta Cavalli-Sforza, Al Akhawayn University, Morocco Shumile Chabalala, Tshwane University of Technology, South Africa Abdelrahim Elmadany, Uinversity of Jazan, KSA Heshaam Faili, University of Tehran, Iran Mohammad Gharib, University of Florence, Italy Osama Hamed, University of Duisburg-Essen, Germany Linda van Huyssteen, Tshwane University of Technology, South Africa Gabriel Iwasokun, Federal university of Technology Akure, Nigeria Mohamed Lichouri, CRSTDLA, Algeria Itani P. Mandende, Tshwane University of Technology, South Africa Charles Mann, Tshwane University of Technology, South Africa Maredi I. Mphahlele, Tshwane University of Technology, South Africa Nandu C Nair, Univeristy of Trento, Italy Hussein Natsheh, Mawdoo3 Ltd, Jordan Gabriel Ogunleye, Federal University,Oye-Ekiti, Nigeria Sunday Ojo, Tshwane University of Technology, South Africa O. Olugbara, Durban University of Technology, South Africa Pius A. Owolawi, Tshwane University of Technology, South Africa Agnieta B. Pretorius, Tshwane University of Technology, South Africa Adeyanju Sosomi, University of Lagos, Nigeria Nasrin Taghizadeh, University of Tehran, Iran Etienne E. Van Wyk, Tshwane University of Technology, South Africa Organizing committee: Gabor Bella, Univeristy of Trento Mattia Fumagalli, Univeristy of Trento Nandu C Nair, Univeristy of Trento Olha Vozna, University of Trento iv Invited Speakers: Prof. Fausto Giunchiglia, University of Trento, Italy. Dr. Preslav Nakov, Qatar Computing Research Institute (QCRI), Qatar. v Invited Talks Detecting the "Fake News" before they were even written Preslav Nakov Given the recent proliferation of disinformation online, there has been also growing research interest in automatically debunking rumors, false claims, and "fake news". A number of fact-checking initiatives have been launched so far, both manual and automatic, but the whole enterprise remains in a state of crisis: by the time a claim is finally fact-checked, it could have reached millions of users, and the harm caused could hardly be undone. An arguably more promising direction is to focus on fact-checking entire news outlets, which can be done in advance. Then, we could fact-check the news before they were even written: by checking how trustworthy the outlets that published them are. We will show how we do this in the Tanbih news aggregator (http://www.tanbih.org/), which makes users aware of what they are reading. In particular, we develop media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, stance with respect to various claims and topics, as well as audience reach and audience bias in social media. One world - seven thousand languages Fausto Giunchiglia We present a large scale multilingual lexical resource, the Universal Knowledge Core (UKC), which is organized like a Wordnet with, however, a major design difference. In the UKC, the meaning of words is represented not only with synsets, but also using language independent concepts which cluster together the synsets which, in different languages, codify the same meaning. In the UKC, it is concepts and not synsets, as it is the case in the Wordnets, which are connected in a semantic network. The use of language independent concepts allows for the native integrability, analysis and use of any number of languages, with important applications in, e.g., multilingual language processing, reasoning (as needed, for instance, in data and knowledge integration) and image understanding. vi Table of Contents NSURL-2019 Task 8: Semantic Question Similarity in Arabic . . . . . . . . . . . . . . . . . . . 1 Haitham Seelawi, Ahmad Mustafa, Hesham Al-Bataineh, Wael Farhan and Hussein T Al-Natsheh NSURL-2019 Task 7: Named Entity Recognition for Farsi . . . . . . . . . . . . . . . . . . . . . 9 Nasrin Taghizadeh and Hesham Faili Yorùbá Gender Recognition from Speech using Attention-based BiLSTM . . . . . . . . . . . . . 16 Ibukunola Abosede Modupe, Tshephisho Joseph Sefara and Ojo Sunday MorphoBERT: a Persian NER System with BERT and Morphological Analysis . . . . . . . . . . 23 Mahdi Mohseni and Amirhossein Tebbifakhr AtyNegar at NSURL-2019 Task 8: Semantic Question Similarity in Arabic . . . . . . . . . . . . 31 Atieh Sharifi, Hossein Hassanpoor and Najmeh Zare Maduyieh Beheshti-NER: Persian named entity recognition Using BERT . . . . . . . . . . . . . . . . . . . 37 Ehsan Taher, Seyed Abbas Hoseini and Mehrnoush Shamsfard Arabic Dialogue Act Recognition for Textual Chatbot Systems . . . . . . . . . . . . . . . . . . . 43 Alaa Joukhadar, Huda Saghergy, Leen Kweider and Nada Ghneim Tha3aroon at NSURL-2019 Task 8: Semantic Question Similarity in Arabic . . . . . . . . . . . . 50 Ali Fadel, Ibraheem Tuffaha and Mahmoud Al-Ayyoub Motivations, challenges, and perspectives for the development of an Automatic Speech Recognition System for the under-resourced Ngiemboon Language . . . . . . . . . . . . . . . . . . 58 Patrice Yemmene and Laurent Besacier NITK-IT_NLP@NSURL2019: Transfer Learning based POS Tagger for Under Resourced Bho- jpuri and Magahi Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Anand Kumar M The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri without even knowing the alphabet 72 Thomas Proisl, Peter Uhrig, Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi and Sefora Mammarella ST NSURL 2019 Shared Task: Semantic Question Similarity in Arabic . . . . . . . . . . . . . . 79 Mohamed Lichouri, Mourad Abbas, Besma Benaziz and Abed Alhakim Freihat Statistical Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan) . . . . . . . . 84 Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe and Thepchai Supnithi String Similarity Measures for Myanmar Language (Burmese) . . . . . . . . . . . . . . . . . . . 93 Khaing Hsu Wai, Ye Kyaw Thu, Hnin Aye Thant, Swe Zin Moe and Thepchai Supnithi An Inferential Phonological Connectionist Approach to the perception of Assimilated-English Con- nected Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Hiba Zaidi Improving NER Models by exploiting Named Entity Gazetteer as External Knowledge . . . . . . 106 Atefeh Zafarian and Habibollah Asghari The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic . . . . . . . 111 Hana Al-Theiabat and Aisha Al-Sadi vii Hidden Markov-based Part-of-Speech Tagger for Igbo Resource-Scarce African Language . . . . 117 Ihenaetu Olamma, Michael Kingsley and Sunday Ojo Building Ontologyfor Yorùbá Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Theresa Okediya, Ibukun Afolabi, Olamma Iheanetu and Sunday Ojo viii
2019 • 138 Pages • 7.02 MB
2021 • 11 Pages • 189.66 KB
2022 • 10 Pages • 112.98 KB
2022 • 16 Pages • 462.83 KB
2022 • 125 Pages • 3.18 MB
2022 • 10 Pages • 93.64 KB
2022 • 12 Pages • 92.57 KB
2022 • 10 Pages • 92.38 KB
2022 • 10 Pages • 108.15 KB
2022 • 12 Pages • 131.89 KB
2000 • 10 Pages • 88.3 KB
2022 • 14 Pages • 962.63 KB
2021 • 157 Pages • 3.6 MB
2022 • 64 Pages • 5.56 MB
2022 • 91 Pages • 1.33 MB
2022 • 10 Pages • 92.93 KB