ACL 2020 The 5th Workshop on Representation Learning for NLP (RepL4NLP-2020) Proceedings of the Workshop July 9, 2020 c⃝2020 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-952148-15-6 ii Introduction The 5th Workshop on Representation Learning for NLP (RepL4NLP-2020) will be hosted at ACL 2020. The workshop is being organised by Spandana Gella, Johannes Welbl, Marek Rei, Fabio Petroni, Patrick Lewis, Emma Strubell, Minjoon Seo and Hannaneh Hajishirzi; and advised by Isabelle Augenstein, Kyunghyun Cho, Edward Grefenstette, Karl Moritz Hermann, and Chris Dyer. The workshop is organised by the ACL Special Interest Group on Representation Learning (SIGREP). The 5th Workshop on Representation Learning for NLP aims to continue the success of the 1st Workshop on Representation Learning for NLP (about 50 submissions and over 250 attendees; second most attended collocated event at ACL’16 after WMT), 2nd Workshop on Representation Learning for NLP, 3rd Workshop on Representation Learning for NLP, and 4th Workshop on Representation Learning for NLP. The workshop was introduced as a synthesis of several years of independent *CL workshops focusing on vector space models of meaning, compositionality, and the application of deep neural networks and spectral methods to NLP. It provides a forum for discussing recent advances on these topics, as well as future research directions in linguistically motivated vector-based models in NLP. iii Organizers: Spandana Gella, Amazon AI Johannes Welbl, University College London Marek Rei, Imperial College London Fabio Petroni, Facebook AI Research Patrick Lewis, University College London & FAIR Emma Strubell, Carnegie Mellon University & FAIR Minjoon Seo, University of Washington & Naver Hannaneh Hajishirzi, University of Washington Senior Advisors: Kyunghyun Cho, NYU and Facebook AI Research Edward Grefenstette, Facebook AI Research & University College London Karl Moritz Hermann, DeepMind Laura Rimell, DeepMind Chris Dyer, DeepMind Isabelle Augenstein, University of Copenhagen Keynote Speakers: Kristina Toutanova, Google Research Ellie Pavlick, Brown University & Google Mike Lewis, Facebook AI Research Evelina Fedorenko, Massachusetts Institute of Technology Program Committee: Muhammad Abdul-Mageed Guy Aglionby Roee Aharoni Arjun Akula Julio Amador Díaz López Mikel Artetxe Yoav Artzi Miguel Ballesteros Gianni Barlacchi Max Bartolo Joost Bastings Federico Bianchi Rishi Bommasani Samuel R. Bowman Andrew Caines Claire Cardie Haw-shiuan Chang v Lin Chen Danlu Chen Yue Chen Yu Cheng Manuel R. Ciosici William Cohen Christopher Davis Eliezer de Souza da Silva Luciano Del Corro Zhi-Hong Deng Leon Derczynski Shehzaad Dhuliawala Giuseppe Antonio Di Luna Kalpit Dixit Aleksandr Drozd Kevin Duh Necati Bora Edizel Guy Emerson Eraldo Fernandes Orhan Firat Rainer Gemulla Kevin Gimpel Hongyu Gong Ana Valeria González Batool Haider He He Ji He Jiaji Huang Sung Ju Hwang Robin Jia Mark Johnson Arzoo Katiyar Santosh Kesiraju Douwe Kiela Ekaterina Kochmar Julia Kreutzer Shankar Kumar John P. Lalor Carolin Lawrence Kenton Lee Xiang Li Shaohua Li Tao Li Bill Yuchen Lin Chu-Cheng Lin Peng Liu Feifan Liu Fei Liu Suresh Manandhar Luca Massarelli vi Sneha Mehta Todor Mihaylov Tsvetomila Mihaylova Swaroop Mishra Ashutosh Modi Lili Mou Maximilian Mozes Khalil Mrini Phoebe Mulcaire Nikita Nangia Shashi Narayan Thien Huu Nguyen Tsuyoshi Okita Ankur Padia Ashwin Paranjape Tom Pelsmaeker Aleksandra Piktus Vassilis Plachouras Edoardo Maria Ponti Ratish Puduppully Leonardo Querzoni Chris Quirk Vipul Raheja Muhammad Rahman Natraj Raman Surangika Ranathunga Siva Reddy Sravana Reddy Roi Reichart Devendra Sachan Marzieh Saeidi Avneesh Saluja Hinrich Schütze Tianze Shi Vered Shwartz Kyungwoo Song Daniil Sorokin Lucia Specia Mark Steedman Karl Stratos Ming Sun Jörg Tiedemann Ivan Titov Nadi Tomeh Shubham Toshniwal Kristina Toutanova Lifu Tu Lyle Ungar Menno van Zaanen Andrea Vanzo vii Shikhar Vashishth Eva Maria Vecchi Elena Voita Yogarshi Vyas Hai Wang Bonnie Webber Dirk Weissenborn Rodrigo Wilkens Yuxiang Wu Yadollah Yaghoobzadeh Haiqin Yang Majid Yazdani Wen-tau Yih Hong Yu Wenxuan Zhou Dong Zhou Xiangyang Zhou Imed Zitouni Diarmuid Ó Séaghdha Robert Östling viii Table of Contents Zero-Resource Cross-Domain Named Entity Recognition Zihan Liu, Genta Indra Winata and Pascale Fung. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages Tyler A. Chang and Anna Rafferty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Learning Probabilistic Sentence Representations from Paraphrases Mingda Chen and Kevin Gimpel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Word Embeddings as Tuples of Feature Probabilities Siddharth Bhat, Alok Debnath, Souvik Banerjee and Manish Shrivastava . . . . . . . . . . . . . . . . . . . . . 24 Compositionality and Capacity in Emergent Languages Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai and Kyunghyun Cho . . . . . . . . . . . . 34 Learning Geometric Word Meta-Embeddings Pratik Jawanpuria, Satya Dev N T V, Anoop Kunchukuttan and Bamdev Mishra . . . . . . . . . . . . . . 39 Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces Ivan Vuli´c, Anna Korhonen and Goran Glavaš . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Adversarial Training for Commonsense Inference Lis Pereira, Xiaodong Liu, Fei Cheng, Masayuki Asahara and Ichiro Kobayashi. . . . . . . . . . . . . . .55 Evaluating Natural Alpha Embeddings on Intrinsic and Extrinsic Tasks Riccardo Volpi and Luigi Malagò. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61 Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT Ashutosh Adhikari, Achyudh Ram, Raphael Tang, William L. Hamilton and Jimmy Lin . . . . . . . 72 Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference Cemil Cengiz and Deniz Yuret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 A Metric Learning Approach to Misogyny Categorization Juan Manuel Coria, Sahar Ghannay, Sophie Rosset and Hervé Bredin . . . . . . . . . . . . . . . . . . . . . . . . 89 On the Choice of Auxiliary Languages for Improved Sequence Tagging Lukas Lange, Heike Adel and Jannik Strötgen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text Lukas Lange, Anastasiia Iurshina, Heike Adel and Jannik Strötgen . . . . . . . . . . . . . . . . . . . . . . . . . 103 Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation Alessio Miaschi and Felice Dell’Orletta. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 Are All Languages Created Equal in Multilingual BERT? Shijie Wu and Mark Dredze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Staying True to Your Word: (How) Can Attention Become Explanation? Martin Tutek and Jan Snajder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 ix Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell Gordon, Kevin Duh and Nicholas Andrews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 On Dimensional Linguistic Properties of the Word Embedding Space Vikas Raunak, Vaibhav Kumar, Vivek Gupta and Florian Metze . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A Cross-Task Analysis of Text Span Representations Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu and Kevin Gimpel . 166 Enhancing Transformer with Sememe Knowledge Yuhui Zhang, Chenghao Yang, Zhengping Zhou and Zhiyuan Liu . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Evaluating Compositionality of Sentence Representation Models Hanoz Bhathena, Angelica Willis and Nathan Dass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Supertagging with CCG primitives Aditya Bhargava and Gerald Penn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name? Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi and Sunita Sarawagi. .205 x Workshop Program Thursday, July 9, 2020 9:30–9:45 Welcome and Opening Remarks 9:45–14:45 Keynote Session 9:45–10:30 Invited talk 1 Kristina Toutanova 10:30–11:00 Coffee Break 11:00–11:45 Invited talk 2 Ellie Pavlick 11:45–12:30 Invited talk 3 Mike Lewis 12:30–14:00 Lunch 14:00–14:45 Invited talk 4 Evelina Fedorenko 14:45–15:00 Outstanding Papers Spotlight Presentations xi Thursday, July 9, 2020 (continued) 15:00–16:30 Poster Session Zero-Resource Cross-Domain Named Entity Recognition Zihan Liu, Genta Indra Winata and Pascale Fung Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages Tyler A. Chang and Anna Rafferty Learning Probabilistic Sentence Representations from Paraphrases Mingda Chen and Kevin Gimpel On the Ability of Self-Attention Networks to Recognize Counter Languages Satwik Bhattamishra, Kabir Ahuja and Navin Goyal Word Embeddings as Tuples of Feature Probabilities Siddharth Bhat, Alok Debnath, Souvik Banerjee and Manish Shrivastava Compositionality and Capacity in Emergent Languages Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai and Kyunghyun Cho Learning Geometric Word Meta-Embeddings Pratik Jawanpuria, Satya Dev N T V, Anoop Kunchukuttan and Bamdev Mishra Variational Inference for Learning Representations of Natural Language Edits Edison Marrese-Taylor, Machel Reid and Yutaka Matsuo Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces Ivan Vuli´c, Anna Korhonen and Goran Glavaš Adversarial Training for Commonsense Inference Lis Pereira, Xiaodong Liu, Fei Cheng, Masayuki Asahara and Ichiro Kobayashi Evaluating Natural Alpha Embeddings on Intrinsic and Extrinsic Tasks Riccardo Volpi and Luigi Malagò xii Thursday, July 9, 2020 (continued) Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT Ashutosh Adhikari, Achyudh Ram, Raphael Tang, William L. Hamilton and Jimmy Lin Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference Cemil Cengiz and Deniz Yuret A Metric Learning Approach to Misogyny Categorization Juan Manuel Coria, Sahar Ghannay, Sophie Rosset and Hervé Bredin On the Choice of Auxiliary Languages for Improved Sequence Tagging Lukas Lange, Heike Adel and Jannik Strötgen Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text Lukas Lange, Anastasiia Iurshina, Heike Adel and Jannik Strötgen Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investi- gation Alessio Miaschi and Felice Dell’Orletta Are All Languages Created Equal in Multilingual BERT? Shijie Wu and Mark Dredze Staying True to Your Word: (How) Can Attention Become Explanation? Martin Tutek and Jan Snajder Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell Gordon, Kevin Duh and Nicholas Andrews On Dimensional Linguistic Properties of the Word Embedding Space Vikas Raunak, Vaibhav Kumar, Vivek Gupta and Florian Metze A Simple Approach to Learning Unsupervised Multilingual Embeddings Pratik Jawanpuria, Mayank Meghwanshi and Bamdev A Cross-Task Analysis of Text Span Representations Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu and Kevin Gimpel xiii Thursday, July 9, 2020 (continued) Enhancing Transformer with Sememe Knowledge Yuhui Zhang, Chenghao Yang, Zhengping Zhou and Zhiyuan Liu Evaluating Compositionality of Sentence Representation Models Hanoz Bhathena, Angelica Willis and Nathan Dass AI4Bharat-IndicNLP Dataset: Monolingual Corpora and Word Embeddings for In- dic Languages: Monolingual Corpora and Word Embeddings for Indic Languages Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N.C., Avik Bhat- tacharyya, Mitesh M. Khapra and Pratyush Kumar Supertagging with CCG primitives Aditya Bhargava and Gerald Penn What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name? Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi and Sunita Sarawagi 16:30–17:30 Panel Discussion 17:30–17:40 Closing Remarks and Best Paper Announcement xiv Proceedings of the 5th Workshop on Representation Learning for NLP (RepL4NLP-2020), pages 1–6 July 9, 2020. c⃝2020 Association for Computational Linguistics Zero-Resource Cross-Domain Named Entity Recognition Zihan Liu, Genta Indra Winata, Pascale Fung Center for Artificial Intelligence Research (CAiRE) Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong [email protected] Abstract Existing models for cross-domain named en- tity recognition (NER) rely on numerous un- labeled corpus or labeled NER training data in target domains. However, collecting data for low-resource target domains is not only ex- pensive but also time-consuming. Hence, we propose a cross-domain NER model that does not use any external resources. We first intro- duce a Multi-Task Learning (MTL) by adding a new objective function to detect whether to- kens are named entities or not. We then in- troduce a framework called Mixture of Entity Experts (MoEE) to improve the robustness for zero-resource domain adaptation. Finally, ex- perimental results show that our model outper- forms strong unsupervised cross-domain se- quence labeling models, and the performance of our model is close to that of the state-of-the- art model which leverages extensive resources. 1 Introduction Named entity recognition (NER) is a fundamen- tal task in text understanding and information ex- traction. Recently, supervised learning approaches have shown their effectiveness in detecting named entities (Ma and Hovy, 2016; Chiu and Nichols, 2016; Winata et al., 2019). However, there is a vast performance drop for low-resource target domains when massive training data are absent. To solve this data scarcity issue, a straightforward idea is to utilize the NER knowledge learned from high- resource domains and then adapt it to low-resource domains, which is called cross-domain NER. Due to the large variances in entity names across different domains, cross-domain NER has thus far been a challenging task. Most existing meth- ods consider a supervised setting, leveraging la- beled NER data for both the source and target do- mains (Yang et al., 2017; Lin and Lu, 2018). However, labeled data in target domains is not always available. Unsupervised domain adaptation naturally arises as a possible way to circumvent the usage of labeled NER data in target domains. However, the only existing method, proposed by Jia et al. (2019), requires an external unlabeled data corpus in both the source and target domains to conduct the unsupervised cross-domain NER task, and such resources are difficult to obtain, especially for low-resource target domains. Therefore, we consider unsupervised zero-resource cross-domain adaptation for NER which only utilizes the NER training samples in a single source domain. To meet the challenge of zero-resource cross- domain adaptation, we first propose to conduct multi-task learning (MTL) by adding an objective function to detect whether tokens are named enti- ties or not. This objective function helps the model to learn general representations of named entities and to distinguish named entities from sequences in target domains. In addition, we observe that in many cases, different entity categories could have a similar or the same context. For example, in the sentence “Arafat subsequently cancelled a meeting between Israeli and PLO officials,” the person en- tity “Arafat”, can be replaced with an organization entity within the same context, which illustrates the confusion among different entity categories and makes zero-resource adaptation much more diffi- cult. Intuitively, when the entity type of a token is hard to be predicted based on the token itself and the token’s context, we want to borrow the opin- ions (i.e., representations) from different experts. Hence, we propose a Mixture of Entity Experts (MoEE) framework to tackle the confusion of en- tity categories, and the predictions are based on the tokens and the context, as well as all entity experts. Experimental results show that our model is able to outperform current strong unsupervised cross- domain sequence tagging approaches, and reach comparable results to the state-of-the-art unsuper- vised method that utilizes extensive resources. 1
2022 • 9 Pages • 186.31 KB
2019 • 296 Pages • 16.18 MB
2021 • 346 Pages • 33.97 MB
2022 • 271 Pages • 26.43 MB
2022 • 14 Pages • 186.84 KB
2021 • 89 Pages • 1.95 MB
2021 • 12 Pages • 87.56 KB
2018 • 236 Pages • 10.37 MB
2022 • 129 Pages • 3.93 MB
2022 • 91 Pages • 1.33 MB
2022 • 11 Pages • 3.52 MB
2022 • 151 Pages • 3.59 MB
2022 • 10 Pages • 92.38 KB
2022 • 10 Pages • 93.07 KB
2022 • 5 Pages • 160.89 KB
2022 • 8 Pages • 70.78 KB