JointReps: Jointly Learning Word Representations using a Corpus and a Knowledge Base (KB)

Overview.

JointReps is a joint model for learning distributed word vector representations (word embeddings) from both large text corpora and knowledge bases (KBs). JointReps utilizes the invaluable semantic relational structure between words existed in KBs and the words co-occurrence statistics in text corpora to learn word representations in vector spaces. JointReps particularly uses the corpus and the KBs to define a global joint objective function.

JointReps has several advantages of utilizing the KBs:

  • It benefits from the knowledge existed in the KBs during the word representations learning phase
  • Any KB that specifies the semantic relations existed between words, such as WordNet, FrameNet and Paraphrase Database can be used with JointReps
  • It uses three different novel mechanisms (SKB, NNE, MNE) of integrating the knowledge from the KBs. Details are reported in the published work below

By combining the knowledge in the KBs into the process of learning word vector representations from the corpus (as shown in the published works below), JointReps has proved to report:

  • A significant improvement over the corpus-only approaches in the quality of the learnt word embeddings
  • SOTA results among variety of models that combine the two sources for learning word embeddings
  • A stable performance among variety of word vector representations dimensions

Publications.

JointReps model was decribed in the following papers, please refer to them if you use any of the available resources

  • Mohammed Alsuhaibani, Danushka Bollegala, Takanori Maehara and Ken-ichi Kawarabayashi: Jointly Learning Word Embeddings using a Corpus and a Knowledge Base [under review]
  • Danushka Bollegala, Mohammed Alsuhaibani, Takanori Maehara, and Ken-ichi Kawarabayashi: Joint Word Representation Learning using a Corpus and a Semantic Lexicon, 30th AAAI Conference on Aritificial Intelligence (AAAI), pp. 2690-2696, Arizona, USA. (2016.2) [PDF][BibTex]

Downloads.

The pre-trained word vectors reported in the above publications are available for downloading.

Codes.

The source code is available [here].

Authors.

Contact.

For any enquiries about JoinReps, please feel free to contact: m[dot]a[dot]alsuhaibani[at]liverpool[dot]ac[dot]uk