Scientific and Creative Analogies in Pre-trained Language Models

Conference on Empirical Methods in Natural Language Processing (EMNLP)


This paper examines the encoding of analogy in large-scale pre-trained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pre-trained language models (LMs). We find that state-ofthe-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.

target	source	targ_word	src_word	alternatives	analogy_type
atom	solar system	nucleus	sun		science
atom	solar system	electron	planet		science
atom	solar system	charge	mass		science
atom	solar system	attracts	attracts		science
atom	solar system	revolves	revolves		science
atom	solar system	electromagnetism	gravity		science
heat transfer	water flow	transfers	flows		science
heat transfer	water flow	temperature	pressure		science
heat transfer	water flow	burner	water tower		science
heat transfer	water flow	kettle	bucket		science
heat transfer	water flow	heating	filling		science
heat transfer	water flow	cooling	emptying		science
heat transfer	water flow	thermodynamics	hydrodynamics		science
sounds	waves	wall	shore		science

Featured Publications