CommAI is a project aiming at developing general-purpose artificial agents that are useful for humans in their daily endeavours. CommAI is also an evaluation framework developed with that goal in mind, where a learning agent must communicate with a scripted teacher in order to solve a never-ending stream of tasks.
The CommAI framework has the following distinguishing features (see Baroni et al. 2017 for a more in-depth discussion):
To illustrate the nature of the framework with an example, consider a dialog where the learner is prompted with the task “repeat AB”, the learner correctly produces the output sequence “AB”, and then the teacher says “well done” followed by positive (+1) reward. Let’s suppose the instruction “repeat” is coded with a single symbol (#
), A and B are also coded as symbols themselves (@
and $
), and that both the learner and the teacher produce a “silence” space character while not emitting useful output. Then, this is how the task would look like from the learner point of view:
Teacher : #@$ %!- Reward : 000000001 Learner : @$
In this case the learner produced the correct sequence and was consequently rewarded. Obviously though, except for some potentially useful inductive bias (the innate preference to make some sequences more likely than others), the learner has no principled way to discover the solution other than exhaustively searching the full space of possible sequences. While this approach is conceivable for short enough sequences, it quickly becomes intractable as the solution becomes longer and more complex. Imagine now that the learner is prompted with the following new task: “repeat AB two times”. Again, assuming that “two” and “times” are encoded with unique symbols, a learner would be prompted, for example, with a string looking like #@$*&
(recall #@$
already meant “repeat AB”, while the new symbols *&
could encode “two”+”times”). Now suppose that, after many attempts, the learner does discover the correct output @$@$
. Then, the next time it is prompted with a similar input, say, for example, #@$^&
, a fast learner could already exploit the hypothesis that the symbol #
maps to a concept analogue to “repeat” or “copy” while the symbol &
is a sort of marker indicating that the preceding symbol is a quantity. In this case, the ^
symbol stands for the number 5, and so the learner will at some point hit the correct solution:
Teacher : #@$^& %!- Reward : 00000000000000001 Learner : @$@$@$@$@$
Note how much the learner can now extrapolate from the observed information. For example, it can now associate the symbol *
to the quantity 2 and ^
to 5, which could also be exploited in other tasks. Moreover, it could potentially learn that the sequence that encodes “well done” (%!-)
is by itself denoting some reward signal, given that it only appears preceding the numerical reward (we have not seen examples where the learner gives the wrong output here, but you can imagine we won’t tell the learner “well done” in those situations). This latter association would enable the learner to develop an intrinsic reward mechanism to gain feedback from the linguistic input even in the absence of extrinsic reward!
The string repetition task might seem superficially very simple. However, when obfuscating the instructions by replacing known words with arbitrary symbol sequences -which is inevitably how the learner would see the English language in the beginning of its lifetime-, we see that understanding them is actually not such an easy feat. However, by exploring good hypotheses about the meaning of the symbols, we could manage to decode a growing number of them, and this should become easier with time as our knowledge increases. As a final note, we saw that the learning algorithm could learn to pick up linguistic cues to understand that its doing the right thing even without receiving any reward at all, which could potentially guide it on more complex tasks that could require some intermediate form of feedback.
To facilitate research under this framework we introduced CommAI-env, an environment where the experimenter can create datasets for life-long learning by scripting tasks that interact with the learner through a bidirectional communication channel. The teacher can be programmed in an arbitrary fashion though primitives that provide convenient abstractions to send messages to the learner, read back its responses and reward it appropriately. For their part, learners can be written in any programming language or ML paradigm and be evaluated for their ability to maximize the average reward. To find more about it, please visit our GitHub page: https://github.com/facebookresearch/CommAI-env
Tomas Mikolov, Armand Joulin, and Marco Baroni, A Roadmap towards machine intelligence. arXiv:1511.08130
Marco Baroni, Armand Joulin, Allan Jabri, Germán Kruszewski, Angeliki Lazaridou, Klemen Simonic, and Tomas Mikolov, CommAI: Evaluating the first steps towards a useful general AI. arXiv:1701.08954
User group for the CommAI-env platform
Tomas Mikolov
Marco Baroni
Allan Jabri
Armand Joulin
Germán Kruszewski
Moustapha Cissé
Klemen Simonic
Amaç Herdagdelen
The “General AI Challenge” organized by GoodAI is based onCommAI.
We are now accepting applications for the CommAI Visiting Researcher Program.
The Facebook Fellowship Program is designed to encourage and support promising doctoral students. We would like to support a student working on non-incremental approaches to addressing interactive AI, with a CommAI Fellowship.