While neural networks have achieved the state-of-the-art results on various natural language processing (NLP) tasks, their robustness to changes in the input distribution and their ability to transfer to related tasks are one of the biggest open challenges. Modern NLP systems interact with text from heterogeneous sources with distinct distributions while the underlying linguistic regularities may be shared across tasks. This presents several interrelated challenges:
- From an application perspective, these models need to produce a robust output at test time given diverse inputs, even if such input distributions have never been observed at training time. For instance, content on the internet can be characterized by informal language with a long tail of variations in terms of lexical choice, spelling, style/genre, emerging vocabularies (slang, memes, etc.) and other linguistic phenomena.
- From the machine learning perspective, we need theoretical and empirical understanding of the intrinsic behaviors of neural networks used in NLP tasks both at training and inference time. For example, how to derive formal verification of a model’s robustness for a specific task? What training objectives and optimization methods can improve robustness to adversarial input at prediction time? Given neural models are trained on large amounts of data from heterogenous sources, how is model quality affected by noise/bias in training data? At inference time, what are unbiased and robust evaluation protocols to assess whether the model has improved linguistic generalization capability?
Less robust models can lead to low-quality outputs while being exposed to natural noise, being susceptible to adversarial inputs, or catastrophic failures in the extreme case. We invite the academic community to propose novel and robust methods to address the above challenges.