How to Fill in the Blanks with Language Models

Enabling Language Models to Fill in the Blanks
Chris Donahue, Mina Lee, Percy Liang
ACL 2020
paper · code · demo · blog · talk

Most existing text generation systems (like autocomplete for text messages and emails) can only generate text based on previous context. The ability to take both previous and subsequent text into account could power a new generation of writing assistance tools for planning, composing, and editing text.

This work presents an extremely simple and quick approach for building text generation systems with such capabilities. With this approach, one can simply download an existing pre-trained language model and enable it to fill in any number and length of blanks in a document by fine-tuning it on artificially generated examples. Our experiments show that humans have difficulty identifying sentences generated by this approach as machine-generated.


Fill in the blanks?

Consider the following sentence with blanks:

She ate ____ for ____

To fill in the blanks, one needs to consider both preceding and subsequent text (in this case, “She ate” and “for”). There can be many reasonable ways to fill in the blanks:

She ate leftover pasta for lunch
She ate chocolate ice cream for dessert
She ate toast for breakfast before leaving for school
She ate rather quickly for she was in a hurry that evening

Language models?

Language modeling is a special case of filling in the blanks where only the preceding text is present and there is only one blank at the end.

She ate leftover pasta for ____

Language models are the models that can perform language modeling. In recent few years, a number of large-scale language models are introduced (e.g. GPT-3) and shown to achieve human-like performance. These models are often pre-trained on massive amount of unlabeled data, requiring huge amount of computation and resource.

Our goal is to take these existing language models and make them perform the more general task of infilling.


When editing or revising we often write in a non-linear manner.

Writing an email


Thanks for updating the draft! 

The modifications look good with one exception. 
Can you revert the wording of the task definition?

An existing language model might suggest something like great to me because it only considers the preceding text but not the subsequent text.

A better suggestion in this case would be something like good with one exception since the writer is not completely satisfied and suggesting a further revision.

Writing a novel

We were lost in the dark forest. Suddenly, we saw a flashlight in the distance. A wave of relief washed over us and we ran over to greet the other traveler.

When you don’t have a concrete idea on how to connect two scenes, the system can suggest a way to connect the fragmented ideas.


The task of filling in the blanks is known as text infilling in the field of Natural Language Processing (NLP). It is the task of predicting blanks (or missing spans) of text at any position in text.

The general definition of text infilling considers text with an arbitrary number of blanks where each blank can represent one of more missing tokens.


How can we make a language model fill in the blanks?

Our approach is infilling by language modeling. With this approach, one can simply (1) download an existing pre-trained language model and (2) enable it to fill in any number and length of blanks in test by fine-tuning it on artificially generated examples.

Concretely, let’s see what happens at training and test time!

Training time

  1. Manufacture infilling examples

To produce an infilling example for given data, first generate input by randomly replacing some tokens in the data with [blank] tokens.

Data: She ate leftover pasta for lunch.
Input: She ate [blank] for [blank].

Then, generate a target by concatenating the replaced tokens, separated by the (answer) token.

Target: leftover pasta (answer) lunch (answer)

Finally, construct the complete infilling example by concatenating input, a special separator token [sep], and target.

New data: She ate [blank] for [blank]. <sep> leftover pasta (answer) lunch (answer)

2. Download your favorite language model

For instance, OpenAI GPT-2

3. Fine-tune the model on infilling examples

Now, you can fine-tune the model on the infilling examples (new data) using standard language model training methodology.

Test time

Once trained, we can use the language model to infill at test time.

As input, the model takes incomplete text with blanks and generates a target.

Input: He drinks [blank] after [blank].
Target: water (answer) running (answer)

You can then construct the complete text by simply replacing [blank] tokens in the input with predicted answers in the target in a deterministic fashion.

Output: He drinks water after running.


Turing test

The following is a short story consisting of five sentences. One of the sentences is swapped with a sentence generated by our model.

Q. Identify one of the five sentences generated by machine.

[1] Patty was excited about having her friends over. 
[2] She had been working hard preparing the food.
[3] Patty knew her friends wanted pizza.
[4] All of her friends arrived and were seated at the table.
[5] Patty had a great time with her friends.

(The answer is in the table below.)

In our experiments, we sampled a short story from ROCstories (Mostafazadeh et al., 2016), randomly replaced one of the sentences with a [blank] token, and infilled with a sentence generated by a model. Then, we asked 100 people to identify which of the sentences in a story was machine-generated.

SystemScoreGenerated sentence
BERT (Devlin et al., 2019)20%favoritea “, Mary brightly said.
SA (Zhu et al., 2019)29%She wasn’t sure she had to go to the store.
LM 41%She went to check the tv.
ILM (ours)45%Patty knew her friends wanted pizza.
Human78%She also had the place looking spotless.

System output for sentence [3] in the above example.

The results show that people have difficulty identifying sentences infilled by our model as machine-generated 45% of the time.

More experiments and analysis can be found in the paper.

Details for Practitioners

Advantages of our framework

  1. Our framework incurs almost no computational overhead compared to language modeling. In contrast, using language models to directly predict complete text from incomplete text will effectively double the sequence length. This is particularly problematic when considering models like GPT-2 whose memory usage grows quadratically with sequence length.
  2. Our framework requires minimal change to the vocabulary of an existing language models. Specifically, you need three additional tokens: [blank], , and [sep].
  3. Our framework offers the ability to attend to the entire context on both sides of a blank with the simplicity of decoding from language models.

Experimental setup

  • Model: GPT-2 small (any left-to-right language model can be used)
  • Training time: one day on a single GPU
  • Early stopping criteria: perplexity on the validation set
  • Mask function: mask out paragraphs, sentences, n-grams, and words with a marginal token mask rate of about 15%

Please refer to our paper for further details.

Try it out!

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s