ACL 2018 Representation Learning for NLP Proceedings of

ACL 2018 Representation Learning for NLP Proceedings of (PDF)

2018 • 236 Pages • 10.37 MB • English
Posted June 30, 2022 • Submitted by pdf.user

Visit PDF download

Download PDF To download page

Summary of ACL 2018 Representation Learning for NLP Proceedings of

ACL 2018 Representation Learning for NLP Proceedings of the Third Workshop July 20, 2018 Melbourne, Australia Sponsors: c⃝2018 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-948087-43-8 ii Introduction The ACL 2018 Workshop on Representation Learning for NLP (RepL4NLP) takes place on Friday, July 20, 2018 in Melbourne, Australia, immediately following the 56th Annual Meeting of the Association for Computational Linguistics (ACL). The workshop is generously sponsored by Facebook, Salesforce, ASAPP, DeepMind, Microsoft Research, and Naver. Repl4NLP is organised by Isabelle Augenstein, Kris Cao, He He, Felix Hill, Spandana Gella, Jamie Kiros, Hongyuan Mei and Dipendra Misra, and advised by Kyunghyun Cho, Edward Grefenstette, Karl Moritz Hermann and Laura Rimell. The 3rd Workshop on Representation Learning for NLP aims to continue the success of the 1st Workshop on Representation Learning for NLP, which received about 50 submissions and over 250 attendees and was the second most attended collocated event at ACL 2016 in Berlin, Germany after WMT; and the 2nd Workshop on Representation Learning for NLP at ACL 2017 in Vancouver, Canada. The workshop has a focus on vector space models of meaning, compositionality, and the application of deep neural networks and spectral methods to NLP. It provides a forum for discussing recent advances on these topics, as well as future research directions in linguistically motivated vector-based models in NLP. iii Organizers: Isabelle Augenstein, University of Copenhagen Kris Cao, University of Cambridge He He, Stanford University Felix Hill, DeepMind Spandana Gella, University of Edinburgh Jamie Kiros, University of Toronto Hongyuan Mei, Johns Hopkins University Dipendra Misra, Cornell University Senior Advisors: Kyunghyun Cho, NYU and FAIR Edward Grefenstette, DeepMind Karl Moritz Hermann, DeepMind Laura Rimell, DeepMind Program Committee: Eneko Agirre, University of the Basque Country Yoav Artzi, Cornell University Mohit Bansal, UNC Chapel Hill Meriem Beloucif, HKUST Jonathan Berant, Tel-Aviv Johannes Bjerva, University of Copenhagen Jan Buys, Oxford University Xilun Chen, Cornell University Eunsol Choi, University of Washington Heeyoul Choi, Handong Global University Junyoung Chung, University of Montreal Manuel Ciosici, Aarhus University Stephen Clark, DeepMind Marco Damonte, University of Edinburgh Desmond Elliot, University of Edinburgh Katrin Erk, University of Texas Orhan Firat, Middle East Technical University Lucie Flekova, Amazon Research Kevin Gimpel, TTI-Chicago Caglar Gulcehre, University of Montreal Gholamreza Haffari, Monash University Mohit Iyyer, AI2 Katharina Kann, LMU Arzoo Katiyar, Cornell University Miryam de Lhoneux, Uppsala University Tegan Maharaj, Polytechnique Montreal Ana Marasovic, Heidelberg, University Yishu Miao, Oxford University Todor Mihaylov, Heidelberg University v Pasquale Minervini, UCL Nikita Nangia, NYU Shashi Narayan, University of Edinburgh Thien Huu Nguyen, NYU Robert Östling, Stockholm University Alexander Panchenko, University of Hamburg Matthew Peters, AI2 Barbara Plank, University of Groningen Marek Rei, University of Cambridge Roi Reichart, Technion Alan Ritter, Ohio State University Diarmuid Ó Séaghdha, Apple Holger Schwenk, Facebook Research Tianze Shi, Cornell University Vered Shwartz, Bar-Ilan University Ashudeep Singh, Cornell University Richard Socher, Salesforce Mark Steedman, University of Edinburgh Karl Stratos, Columbia University Sam Thomson, CMU Ivan Titov, University of Edinburgh Shubham Toshniwal, TTIC Andreas Vlachos, Sheffield Pontus Stenetorp, UCL Anders Søgaard, University of Copenhagen Jörg Tiedemann, University of Helsinki Chris Quirk, Microsoft Research Lyle Ungar, University of Pennsylvania Eva Maria Vecchi, University of Cambridge Dirk Weissenborn, German Research Center for AI Tsung-Hsien Wen, University of Cambridge Yi Yang, Bloomberg LP Helen Yannakoudakis, University of Cambridge Invited Speaker: Yejin Choi, University of Washington Trevor Cohn, University of Melbourne Margaret Mitchell, Google Research Yoav Goldberg, Bar Ilan University vi Table of Contents Corpus Specificity in LSA and Word2vec: The Role of Out-of-Domain Documents Edgar Altszyler, Mariano Sigman and Diego Fernandez Slezak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Hierarchical Convolutional Attention Networks for Text Classification Shang Gao, Arvind Ramanathan and Georgia Tourassi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons Hwiyeol Jo and Stanley Jungkyu Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chat Discrimination for Intelligent Conversational Agents with a Hybrid CNN-LMTGRU Network Dennis Singh Moirangthem and Minho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Text Completion using Context-Integrated Dependency Parsing Amr Rekaby Salama, Özge Alacam and Wolfgang Menzel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Quantum-Inspired Complex Word Embedding Qiuchi Li, Sagar Uprety, Benyou Wang and Dawei Song. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50 Natural Language Inference with Definition Embedding Considering Context On the Fly Kosuke Nishida, Kyosuke Nishida, Hisako Asano and Junji Tomita . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Comparison of Representations of Named Entities for Document Classification Lidia Pivovarova and Roman Yangarber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64 Speeding up Context-based Sentence Representation Learning with Non-autoregressive Convolutional Decoding Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang and Virginia de Sa . . . . . . . . . . . . . . . . . . . . . . . 69 Connecting Supervised and Unsupervised Sentence Embeddings Gil Levi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A Hybrid Learning Scheme for Chinese Word Embedding Wenfan Chen and Weiguo Sheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline Kawin Ethayarajh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91 Evaluating Word Embeddings in Multi-label Classification Using Fine-Grained Name Typing Yadollah Yaghoobzadeh, Katharina Kann and Hinrich Schütze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A Dense Vector Representation for Open-Domain Relation Tuples Ade Romadhony, Alfan Farizki Wicaksono, Ayu Purwarianti and Dwi Hendratmo Widyantoro 107 Exploiting Common Characters in Chinese and Japanese to Learn Cross-Lingual Word Embeddings via Matrix Factorization Jilei Wang, Shiying Luo, Weiyan Shi, Tao Dai and Shu-Tao Xia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 WordNet Embeddings Chakaveh Saedi, António Branco, João António Rodrigues and João Silva. . . . . . . . . . . . . . . . . . .122 Knowledge Graph Embedding with Numeric Attributes of Entities Yanrong Wu and Zhichun Wang. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132 vii Injecting Lexical Contrast into Word Vectors by Guiding Vector Space Specialisation Ivan Vuli´c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Characters or Morphemes: How to Represent Words? Ahmet Üstün, Murathan Kurfalı and Burcu Can . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences Athul Paul Jacob, Zhouhan Lin, Alessandro Sordoni and Yoshua Bengio . . . . . . . . . . . . . . . . . . . . 154 Limitations of Cross-Lingual Learning from Image Search Mareike Hartmann and Anders Søgaard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159 Learning Semantic Textual Similarity from Conversations Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-Yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-hsuan Sung, Brian Strope and Ray Kurzweil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification Katherine Yu, Haoran Li and Barlas Oguz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175 LSTMs Exploit Linguistic Attributes of Data Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan and Noah A. Smith . . . . . . . . . . . . . . . . . 180 Learning Distributional Token Representations from Visual Features Samuel Broscheit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Jointly Embedding Entities and Text with Distant Supervision Denis Newman-Griffis, Albert M. Lai and Eric Fosler-Lussier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195 A Sequence-to-Sequence Model for Semantic Role Labeling Angel Daza and Anette Frank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .207 Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings Nikola Ljubeši´c, Darja Fišer and Anita Peti-Stanti´c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 viii Workshop Program Friday, July 20, 2018 09:30–09:45 Welcome and Opening Remarks 09:45–14:45 Keynote Session 09:45–10:30 Invited Talk 1 Yejin Choi 10:30–11:00 Coffee Break 11:00–11:45 Invited Talk 2 Trevor Cohn 11:45–12:30 Invited Talk 3 Margaret Mitchell 12:30–14:00 Lunch 14:00–14:45 Invited Talk 4 Yoav Goldberg 14:45–15:00 Outstanding Papers Spotlight Presentations ix Friday, July 20, 2018 (continued) 15:00–16:30 Poster Session (including Coffee Break from 15:30-16:00) + Drinks Reception Corpus Specificity in LSA and Word2vec: The Role of Out-of-Domain Documents Edgar Altszyler, Mariano Sigman and Diego Fernandez Slezak Hierarchical Convolutional Attention Networks for Text Classification Shang Gao, Arvind Ramanathan and Georgia Tourassi Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons Hwiyeol Jo and Stanley Jungkyu Choi Chat Discrimination for Intelligent Conversational Agents with a Hybrid CNN- LMTGRU Network Dennis Singh Moirangthem and Minho Lee Text Completion using Context-Integrated Dependency Parsing Amr Rekaby Salama, Özge Alacam and Wolfgang Menzel Quantum-Inspired Complex Word Embedding Qiuchi Li, Sagar Uprety, Benyou Wang and Dawei Song Natural Language Inference with Definition Embedding Considering Context On the Fly Kosuke Nishida, Kyosuke Nishida, Hisako Asano and Junji Tomita Comparison of Representations of Named Entities for Document Classification Lidia Pivovarova and Roman Yangarber Speeding up Context-based Sentence Representation Learning with Non- autoregressive Convolutional Decoding Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang and Virginia de Sa Connecting Supervised and Unsupervised Sentence Embeddings Gil Levi A Hybrid Learning Scheme for Chinese Word Embedding Wenfan Chen and Weiguo Sheng x Friday, July 20, 2018 (continued) Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline Kawin Ethayarajh Evaluating Word Embeddings in Multi-label Classification Using Fine-Grained Name Typing Yadollah Yaghoobzadeh, Katharina Kann and Hinrich Schütze A Dense Vector Representation for Open-Domain Relation Tuples Ade Romadhony, Alfan Farizki Wicaksono, Ayu Purwarianti and Dwi Hendratmo Widyantoro Exploiting Common Characters in Chinese and Japanese to Learn Cross-Lingual Word Embeddings via Matrix Factorization Jilei Wang, Shiying Luo, Weiyan Shi, Tao Dai and Shu-Tao Xia WordNet Embeddings Chakaveh Saedi, António Branco, João António Rodrigues and João Silva Knowledge Graph Embedding with Numeric Attributes of Entities Yanrong Wu and Zhichun Wang Injecting Lexical Contrast into Word Vectors by Guiding Vector Space Specialisa- tion Ivan Vuli´c Characters or Morphemes: How to Represent Words? Ahmet Üstün, Murathan Kurfalı and Burcu Can Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences Athul Paul Jacob, Zhouhan Lin, Alessandro Sordoni and Yoshua Bengio Limitations of Cross-Lingual Learning from Image Search Mareike Hartmann and Anders Søgaard Learning Semantic Textual Similarity from Conversations Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-Yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-hsuan Sung, Brian Strope and Ray Kurzweil Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification Katherine Yu, Haoran Li and Barlas Oguz xi Friday, July 20, 2018 (continued) LSTMs Exploit Linguistic Attributes of Data Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan and Noah A. Smith Learning Distributional Token Representations from Visual Features Samuel Broscheit Jointly Embedding Entities and Text with Distant Supervision Denis Newman-Griffis, Albert M. Lai and Eric Fosler-Lussier A Sequence-to-Sequence Model for Semantic Role Labeling Angel Daza and Anette Frank Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings Nikola Ljubeši´c, Darja Fišer and Anita Peti-Stanti´c 16:30–17:30 Panel Discussion 17:30–17:40 Closing Remarks + Best Paper Awards Announcement xii Proceedings of the 3rd Workshop on Representation Learning for NLP, pages 1–10 Melbourne, Australia, July 20, 2018. c⃝2018 Association for Computational Linguistics Corpus specificity in LSA and word2vec: the role of out-of-domain documents Edgar Altszyler UBA, FCEyN, DC. ICC, UBA-CONICET [email protected] Mariano Sigman U. Torcuato Di Tella - CONICET. [email protected] Diego Fern´andez Slezak UBA, FCEyN, DC, ICC, UBA-CONICET [email protected] Abstract Despite the popularity of word embed- dings, the precise way by which they ac- quire semantic relations between words re- main unclear. In the present article, we in- vestigate whether LSA and word2vec ca- pacity to identify relevant semantic rela- tions increases with corpus size. One in- tuitive hypothesis is that the capacity to identify relevant associations should in- crease as the amount of data increases. However, if corpus size grows in topics which are not specific to the domain of in- terest, signal to noise ratio may weaken. Here we investigate the effect of corpus specificity and size in word-embeddings, and for this, we study two ways for pro- gressive elimination of documents: the elimination of random documents vs. the elimination of documents unrelated to a specific task. We show that word2vec can take advantage of all the documents, obtaining its best performance when it is trained with the whole corpus. On the contrary, the specialization (removal of out-of-domain documents) of the train- ing corpus, accompanied by a decrease of dimensionality, can increase LSA word- representation quality while speeding up the processing time. From a cognitive- modeling point of view, we point out that LSA’s word-knowledge acquisitions may not be efficiently exploiting higher- order co-occurrences and global relations, whereas word2vec does. 1 Introduction The main idea behind corpus-based semantic rep- resentation is that words with similar meanings tend to occur in similar contexts (Harris, 1954). This proposition is called distributional hypothe- sis and provides a practical framework to under- stand and compute the semantic relationship be- tween words. Based in the distributional hypothe- sis, Latent Semantic Analysis (LSA) (Deerwester et al., 1990; Landauer and Dumais, 1997; Hu et al., 2007) and word2vec (Mikolov et al., 2013a,b), are one of the most important methods for word mean- ing representation, which describes each word in a vectorial space, where words with similar mean- ings are located close to each other. Word embeddings have been applied in a wide variety of areas such as information retrieval (Deerwester et al., 1990), psychiatry (Altszyler et al., 2018; Carrillo et al., 2018), treatment op- timization(Corcoran et al., 2018), literature (Diuk et al., 2012) and cognitive sciences (Landauer and Dumais, 1997; Denhi`ere and Lemaire, 2004; Lemaire and Denhi, 2004; Diuk et al., 2012). LSA takes as input a training Corpus formed by a collection of documents. Then a word by doc- ument co-occurrence matrix is constructed, which contains the distribution of occurrence of the dif- ferent words along the documents. Then, usually, a mathematical transformation is applied to reduce the weight of uninformative high-frequency words in the words-documents matrix (Dumais, 1991). Finally, a linear dimensionality reduction is imple- mented by a truncated Singular Value Decomposi- tion, SVD, which projects every word in a sub- space of a predefined number of dimensions, k. The success of LSA in capturing the latent mean- ing of words comes from this low-dimensional mapping. This representation improvement can be explained as a consequence of the elimination of the noisiest dimensions (Turney and Pantel, 2010). Word2vec consists of two neural network mod- els, Continuous Bag of Words (CBOW) and Skip- gram. To train the models, a sliding window is 1 moved along the corpus. In the CBOW scheme, in each step, the neural network is trained to pre- dict the center word (the word in the center of the window based) given the context words (the other words in the window). While in the skip-gram scheme, the model is trained to predict the context words based on the central word. In the present paper, we use the skip-gram, which has produced better performance in Mikolov et al. (2013b). Despite the development of new word represen- tation methods, LSA is still intensively used and has been shown that produce better performances than word2vec methods in small to medium size training corpus (Altszyler et al., 2017). 1.1 Training Corpus Size and Specificity in Word-embeddings Over the last years, great effort has been devoted to understanding how to choose the right parame- ter settings for different tasks (Quesada, 2011; Du- mais, 2003; Landauer and Dumais, 1997; Lapesa and Evert, 2014; Bradford, 2008; Nakov et al., 2003; Baroni et al., 2014). However, considerably lesser attention has been given to study how dif- ferent corpus used as input for training may affect the performance. Here we ask a simple question on the property of the corpus: is there a monotonic relation between corpus size and the performance? More precisely, what happens if the topic of ad- ditional documents differs from the topics in the specific task? Previous studies have surprisingly shown some contradictory results on this simple question. On the one hand, in the foundational work, Landauer and Dumais (1997) compare the word- knowledge acquisition between LSA and that of children’s. This acquisition process may be pro- duced by 1) direct learning, enhancing the incor- poration of new words by reading texts that ex- plicitly contain them; or 2) indirect learning, en- hancing the incorporation of new words by read- ing texts that do not contain them. To do that, they evaluate LSA semantic representation trained with different size corpus in multiple-choice syn- onym questions extracted from the TOEFL exam. This test consists of 80 multiple-choice questions, in which its requested to identify the synonym of a word between 4 options. In order to train the LSA, Landauer and Dumais used the TASA cor- pus (Zeno et al., 1995). Landauer and Dumais (1997) randomly re- placed exam-words in the corpus with non-sense words and varied the number of corpus’ docu- ments selecting nested sub-samples of the total corpus. They concluded that LSA improves its performance on the exam both when training with documents with exam-words and without them. However, as could be expected, they observed a greater effect when training with exam-words. It is worth mentioning that the replacement of exam- words with non-sense words may create incorrect documents, thus, making the algorithm acquire word-knowledge from documents which should have an exam-word but do not. In the Results sec- tion, we will study this indirect word acquisition in the TOEFL test without using non-sense words. Along the same line, Lemaire and Den- hiere (2006) studied the effect of high-order co- occurrences in LSA semantic similarity, which goes further in the study of Landauer’s indirect word acquisition. In their work, Lemaire and Denhiere (2006) measure how the similarity between 28 pairs of words (such as bee/honey and buy/shop) changes when a 400-dimensions LSA is trained with a growing number of paragraphs. Furthermore, they identify for this task the marginal contribution of the first, second and third order of co-occurrence as the number of paragraphs is increased. In this experiment, they found that not only does the first order of co-occurrence contribute to the semantic closeness of the word pairs, but also the second and the third order promote an increment on pairs similarity. It is worth noting that Landauer’s indi- rect word acquisition can be understood in terms of paragraphs without either of the words in a pair, and containing a third or more order co-occurrence link. So, the conclusion from Lemaire and Denhiere (2006) and Landauer and Dumais (1997) studies suggest that increasing corpus size results in a gain, even if this increase is in topics which are un- related for the relevant semantic directions which are pertinent for the task. However, a different conclusion seems to result from other sets of studies. Stone et al. (2006) have studied the effect of Corpus size and specificity in a document similarity rating task. They found that training LSA with smaller subcorpus selected for the specific task domain maintains or even im- proves LSA performance. This corresponds to the intuition of noise filtering, when removing infor- 2 mation from irrelevant dimensions results in im- provements of performance. In addition, Olde et al. (2002) have studied the effect of selecting specific subcorpus in an auto- matic exam evaluation task. They created sev- eral subcorpus from a Physics corpus, progres- sively discarding documents unrelated to the spe- cific questions. Their results showed small differ- ences in the performance between the LSA trained with original corpus and the LSA trained with the more specific subcorpus. It is well known that the number of LSA di- mensions (k) is a key parameter to be duly ad- justed in order to eliminate the noisiest dimen- sions (Landauer and Dumais, 1997; Turney and Pantel, 2010). Excessively high k values may not eliminate enough noisy dimensions, while exces- sively low k values may not have enough dimen- sions to generate a proper representation. In this context, we hypothesize that when out-of-domain documents are discarded, the number of dimen- sions needed to represent the data should be lower, thus, k must be decreased. Regarding word2vec, Cardellino and Alemany (2017) and Dusserre and Padr´o (2017) have shown that word2vec trained with a specific corpus can produce better performance in semantic tasks than when it is trained with a bigger and general cor- pus. Despite these works point out the relevance of domain-specific corpora, they do not study the specificity in isolation, as they compare corpus from different sources. In this article, we set to investigate the effect of the specificity and size of training corpus in word-embeddings, and how this interacts with the number of dimensions. To measure the semantic representations quality we have used two different tasks: the TOEFL exam, and a categorization test. The corpus evaluation method consists in the com- parison between two ways of progressive elimina- tion of documents: the elimination of random doc- uments vs the elimination of out-of-domain docu- ments (unrelated to the specific task). In addition, we have varied k within a wide range of values. As we show, LSA’s dimensionality plays a key role in the LSA representation when the corpus analysis is made. In particular, we observe that both, discarding out-of-domain documents and de- creasing the number of dimensions produces an increase in the algorithm performance. In one of the two tasks, discarding out-of-domain docu- ments without the decrease of k results in the com- plete opposite behavior, showing a strong perfor- mance reduction. On the other hand, word2vec shows in all cases a performance reduction when discarding out-of-domain, which suggests an ex- ploitation of higher-order word co-occurrences. Our contribution in understanding the effect of out-of-domain documents in word-embeddings knowledge acquisitions is valuable from two dif- ferent perspectives: • From an operational point of view: we show that LSA’s performance can be enhanced when: (1) its training corpus is cleaned from out-of-domain documents, and (2) a reduc- tion of LSA’s dimensions number is applied. Furthermore, the reduction of both the cor- pus size and the number of dimensions tend to speed up the processing time. On the other hand, word2vec can take advantage of all the documents, obtaining its best performance when it is trained with the whole corpus. • From a cognitive modeling point of view: we point out that LSA’s word-knowledge ac- quisition does not take advantage of indirect learning, while word2vec does. This throws light upon models capabilities and limita- tions in modeling human cognitive tasks, such as: human word-learning (Landauer and Dumais, 1997; Lemaire and Denhiere, 2006; Landauer, 2007), semantic memory (Denhi`ere and Lemaire, 2004; Kintsch and Mangalath, 2011; Landauer, 2007) and words classification (Laham, 1997). 2 Methods We used TASA corpus (Zeno et al., 1995) in all experiments. TASA is a commonly used linguistic corpus consisting of more than 37 thousand educa- tional texts from USA K12 curriculum. We word- tokenized each document, discarding punctuation marks, numbers, and symbols. Then, we trans- formed each word to lowercase and eliminated stopwords, using the stoplist in NLTK Python package (Bird et al., 2009). TASA corpus contains more than 5 million words in its cleaned version. In each experiment, the training corpus size was changed by discarding documents in two different ways: • Random documents discarding: The desired number of documents (n) contained in the 3