Adapt-NLP 2021 The Second Workshop on Domain Adaptation for NLP Proceedings of the Workshop April 20, 2021 Our Sponsors: ©2021 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-954085-08-4 ii Introduction The growth in computational power and the rise of Deep Neural Networks (DNNs) have revolutionized the field of Natural Language Processing (NLP). The ability to collect massive datasets with the capacity to train big models on powerful GPUs, has yielded NLP-based technology that was beyond imagination only a few years ago. Unfortunately, this technology is still limited to a handful of resource rich languages and domains. This is because most NLP algorithms rely on the fundamental assumption that the training and the test sets are drawn from the same underlying distribution. When the train and test distributions do not match, a phenomenon known as domain shift, such models are likely to encounter performance drops. Despite the growing availability of heterogeneous data, many NLP domains still lack the amounts of labeled data required to feed data-hungry neural models, and in some domains and languages even unlabeled data is scarce. As a result, the problem of domain adaptation, training an algorithm on annotated data from one or more source domains, and applying it to other target domains, is a fundamental challenge that has to be solved in order to make NLP technology available for most world languages and textual domains. Domain Adaptation (DA) is hence the focus of this workshop. Particularly, the topics of the workshop include, but are not restricted to: • Novel DA algorithms, for existing as well as new setups. • Extending DA research to new domains and tasks through both novel datasets and algorithmic approaches. • Proposing novel zero-shot and few-shot algorithms and discussing their relevance for DA. • Exploring the similarities and differences between algorithmic approaches to DA, cross-lingual and cross-task learning. • A conceptual discussion of the definitions of fundamental concepts such as domain, transfer as well as zero-shot and few-shot learning. • Introducing and exploring novel or under-explored DA setups, aiming towards realistic and applicable ones (e.g., one-to-many DA, many-to-many DA, and DA when the target domain is unknown when training on the source domain). Adapt-NLP would not have been possible without the dedication of its program committee. We would like to thank them for their invaluable effort in providing timely and high-quality reviews on a short notice. We are also grateful to our invited speakers and panelists for contributing to our program. The Adapt-NLP workshop organizers, Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, and Yftah Ziser iii Organizing Committee: - Eyal Ben-David, Technion - Israel Institute of Technology - Shay Cohen, University of Edinburgh - Ryan McDonald, ASAPP Research - Barbara Plank, University of Copenhagen - Roi Reichart, Technion - Israel Institute of Technology - Guy Rotman, Technion - Israel Institute of Technology - Yftah Ziser, Amazon Research Program Committee: - Reut Apel, Technion - Israel Institute of Technology - Isabelle Augenstein, University of Copenhagen - Steven Bethard, University of Arizona - Danushka Bollegala, University of Liverpool - Xia Cui, University of Manchester - Kevin Duh, Johns Hopkins University - Jacob Eisenstein, Google AI - Amir Feder, Technion - Israel Institute of Technology - Alexander Fraser, University of Munich - Milica Gasic, Heinrich Heine University Düsseldorf - Suchin Gururangan, Allen Institute for AI - Roman Klinger, University of Stuttgart - Mirella Lapata, University of Edinburgh - Ana Marasovi´c, University of Washington - Timothy Miller, Harvard University - Preslav Nakov, Qatar Computing Research Institute - Mariana Neves, University of Potsdam - Farhad Nooralahzadeh, University of Oslo - Nadav Oved, Technion - Israel Institute of Technology - Yuval Pinter, Georgia Institute of Technology - Sebastian Ruder, DeepMind - Benoit Sagot, ALMAnaCH Research - Anders Søgaard, University of Copenhagen - Idan Szpektor, Google Research v Invited Speakers: - Isabelle Augenstein, University of Copenhagen - Jacob Eisenstein, Google AI - Hinrich Schutze, University of Munich (LMU) Panel: - Jacob Eisenstein, Google AI - Mari Ostendorf, University of Washington - Roi Reichart, Technion - Israel Institute of Technology - Sebastian Ruder, DeepMind - Anders Søgaard, University of Copenhagen vi Table of Contents Multidomain Pretrained Language Models for Green NLP Antonios Maronikolakis and Hinrich Schütze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Pseudo-Label Guided Unsupervised Domain Adaptation of Contextual Embeddings Tianyu Chen, Shaohan Huang, Furu Wei and Jianxin Li. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Conditional Adversarial Networks for Multi-Domain Text Classification Yuan Wu, Diana Inkpen and Ahmed El-Roby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 The impact of domain-specific representations on BERT-based multi-domain spoken language under- standing Judith Gaspers, Quynh Do, Tobias Röding and Melanie Bradford . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Bridging the gap between supervised classification and unsupervised topic modelling for social-media assisted crisis management Mikael Brunila, Rosie Zhao, Andrei Mircea Romascanu, Sam Lumley and Renee Sieber . . . . . . . 33 Challenges in Annotating and Parsing Spoken, Code-switched, Frisian-Dutch Data Anouck Braggaar and Rob van der Goot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Genres, Parsers, and BERT: The Interaction Between Parsers and BERT Models in Cross-Genre Con- stituency Parsing in English and Swedish Daniel Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Cross-Lingual Transfer with MAML on Trees Jezabel Garcia, Federica Freddi, Jamie McGowan, Tim Nieradzik, Feng-Ting Liao, Ye Tian, Da- shan Shiu and Alberto Bernacchia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation Dario Stojanovski and Alexander Fraser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models Mandy Guo, Yinfei Yang, Daniel Cer, Qinlan Shen and Noah Constant. . . . . . . . . . . . . . . . . . . . . . .94 Domain adaptation in practice: Lessons from a real-world information extraction pipeline Timothy Miller, Egoitz Laparra and Steven Bethard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT? Sebastin Santy, Anirudh Srinivasan and Monojit Choudhury . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Locality Preserving Loss: Neighbors that Live together, Align together Ashwinkumar Ganesan, Francis Ferraro and Tim Oates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122 On the Hidden Negative Transfer in Sequential Transfer Learning for Domain Adaptation from News to Tweets Sara Meftah, Nasredine Semmar, Youssef Tamaazousti, Hassane Essafi and Fatiha Sadat. . . . . .140 Trajectory-Based Meta-Learning for Out-Of-Vocabulary Word Embedding Learning Gordon Buck and Andreas Vlachos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146 Dependency Parsing Evaluation for Low-resource Spontaneous Speech Zoey Liu and Emily Prud’hommeaux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 vii An Empirical Study of Compound PCFGs Yanpeng Zhao and Ivan Titov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 User Factor Adaptation for User Embedding via Multitask Learning Xiaolei Huang, Michael J. Paul, Franck Dernoncourt, Robin Burke and Mark Dredze. . . . . . . . .172 On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions Rob van der Goot, Ahmet Üstün and Barbara Plank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Effective Distant Supervision for Temporal Relation Extraction Xinyu Zhao, Shih-Ting Lin and Greg Durrett . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation Haoran Xu and Philipp Koehn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Gradual Fine-Tuning for Low-Resource Domain Adaptation Haoran Xu, Seth Ebner, Mahsa Yarmohammadi, Aaron Steven White, Benjamin Van Durme and Kenton Murray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .214 Analyzing the Domain Robustness of Pretrained Language Models, Layer by Layer Abhinav Ramesh Kashyap, Laiba Mehnaz, Bhavitvya Malik, Abdul Waheed, Devamanyu Hazarika, Min-Yen Kan and Rajiv Ratn Shah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger and Hinrich Schütze . . . . 245 Semantic Parsing of Brief and Multi-Intent Natural Language Utterances Logan Lebanoff, Charles Newton, Victor Hung, Beth Atkinson, John Killilea and Fei Liu . . . . . 255 Domain Adaptation for NMT via Filtered Iterative Back-Translation Surabhi Kumari, Nikhil Jaiswal, Mayur Patidar, Manasi Patwardhan, Shirish Karande, Puneet Agar- wal and Lovekesh Vig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 viii
2022 • 10 Pages • 81.92 KB
2022 • 10 Pages • 93.07 KB
2021 • 7 Pages • 69.32 KB
2021 • 12 Pages • 87.56 KB
2022 • 129 Pages • 3.93 MB
2022 • 10 Pages • 470.67 KB
2022 • 14 Pages • 186.84 KB
2021 • 346 Pages • 33.97 MB
2022 • 6 Pages • 63.28 KB
2022 • 14 Pages • 87.05 KB
2021 • 89 Pages • 1.95 MB
2022 • 183 Pages • 16.15 MB
2022 • 10 Pages • 392.78 KB
2022 • 34 Pages • 7.98 MB
2020 • 230 Pages • 20.7 MB
2022 • 9 Pages • 186.31 KB