How to Fill in the Blanks with Language Models

Enabling Language Models to Fill in the Blanks
Chris Donahue, Mina Lee, Percy Liang
ACL 2020
paper · code · demo · blog · talk

Most existing text generation systems (like autocomplete for text messages and emails) can only generate text based on previous context. The ability to take both previous and subsequent text into account could power a new generation of writing assistance tools for planning, composing, and editing text.

This work presents an extremely simple and quick approach for building text generation systems with such capabilities. With this approach, one can simply download an existing pre-trained language model and enable it to fill in any number and length of blanks in a document by fine-tuning it on artificially generated examples. Our experiments show that humans have difficulty identifying sentences generated by this approach as machine-generated.


Fill in the blanks?

Consider the following sentence with blanks:

She ate ____ for ____

To fill in the blanks, one needs to consider both preceding and subsequent text (in this case, “She ate” and “for”). There can be many reasonable ways to fill in the blanks:

She ate leftover pasta for lunch
She ate chocolate ice cream for dessert
She ate toast for breakfast before leaving for school
She ate rather quickly for she was in a hurry that evening

Language models?

Language modeling is a special case of filling in the blanks where only the preceding text is present and there is only one blank at the end.

She ate leftover pasta for ____

Language models are the models that can perform language modeling. In recent few years, a number of large-scale language models are introduced (e.g. GPT-3) and shown to achieve human-like performance. These models are often pre-trained on massive amount of unlabeled data, requiring huge amount of computation and resource.

Our goal is to take these existing language models and make them perform the more general task of infilling.


When editing or revising we often write in a non-linear manner.

Writing an email


Thanks for updating the draft! 

The modifications look good with one exception. 
Can you revert the wording of the task definition?

An existing language model might suggest something like great to me because it only considers the preceding text but not the subsequent text.

A better suggestion in this case would be something like good with one exception since the writer is not completely satisfied and suggesting a further revision.

Writing a novel

We were lost in the dark forest. Suddenly, we saw a flashlight in the distance. A wave of relief washed over us and we ran over to greet the other traveler.

When you don’t have a concrete idea on how to connect two scenes, the system can suggest a way to connect the fragmented ideas.


The task of filling in the blanks is known as text infilling in the field of Natural Language Processing (NLP). It is the task of predicting blanks (or missing spans) of text at any position in text.

The general definition of text infilling considers text with an arbitrary number of blanks where each blank can represent one of more missing tokens.


How can we make a language model fill in the blanks?

Our approach is infilling by language modeling. With this approach, one can simply (1) download an existing pre-trained language model and (2) enable it to fill in any number and length of blanks in test by fine-tuning it on artificially generated examples.

Concretely, let’s see what happens at training and test time!

Training time

  1. Manufacture infilling examples

To produce an infilling example for given data, first generate input by randomly replacing some tokens in the data with [blank] tokens.

Data: She ate leftover pasta for lunch.
Input: She ate [blank] for [blank].

Then, generate a target by concatenating the replaced tokens, separated by the (answer) token.

Target: leftover pasta (answer) lunch (answer)

Finally, construct the complete infilling example by concatenating input, a special separator token [sep], and target.

New data: She ate [blank] for [blank]. <sep> leftover pasta (answer) lunch (answer)

2. Download your favorite language model

For instance, OpenAI GPT-2

3. Fine-tune the model on infilling examples

Now, you can fine-tune the model on the infilling examples (new data) using standard language model training methodology.

Test time

Once trained, we can use the language model to infill at test time.

As input, the model takes incomplete text with blanks and generates a target.

Input: He drinks [blank] after [blank].
Target: water (answer) running (answer)

You can then construct the complete text by simply replacing [blank] tokens in the input with predicted answers in the target in a deterministic fashion.

Output: He drinks water after running.


Turing test

The following is a short story consisting of five sentences. One of the sentences is swapped with a sentence generated by our model.

Q. Identify one of the five sentences generated by machine.

[1] Patty was excited about having her friends over. 
[2] She had been working hard preparing the food.
[3] Patty knew her friends wanted pizza.
[4] All of her friends arrived and were seated at the table.
[5] Patty had a great time with her friends.

(The answer is in the table below.)

In our experiments, we sampled a short story from ROCstories (Mostafazadeh et al., 2016), randomly replaced one of the sentences with a [blank] token, and infilled with a sentence generated by a model. Then, we asked 100 people to identify which of the sentences in a story was machine-generated.

SystemScoreGenerated sentence
BERT (Devlin et al., 2019)20%favoritea “, Mary brightly said.
SA (Zhu et al., 2019)29%She wasn’t sure she had to go to the store.
LM 41%She went to check the tv.
ILM (ours)45%Patty knew her friends wanted pizza.
Human78%She also had the place looking spotless.

System output for sentence [3] in the above example.

The results show that people have difficulty identifying sentences infilled by our model as machine-generated 45% of the time.

More experiments and analysis can be found in the paper.

Details for Practitioners

Advantages of our framework

  1. Our framework incurs almost no computational overhead compared to language modeling. In contrast, using language models to directly predict complete text from incomplete text will effectively double the sequence length. This is particularly problematic when considering models like GPT-2 whose memory usage grows quadratically with sequence length.
  2. Our framework requires minimal change to the vocabulary of an existing language models. Specifically, you need three additional tokens: [blank], , and [sep].
  3. Our framework offers the ability to attend to the entire context on both sides of a blank with the simplicity of decoding from language models.

Experimental setup

  • Model: GPT-2 small (any left-to-right language model can be used)
  • Training time: one day on a single GPU
  • Early stopping criteria: perplexity on the validation set
  • Mask function: mask out paragraphs, sentences, n-grams, and words with a marginal token mask rate of about 15%

Please refer to our paper for further details.

Try it out!

Generating Regular Expressions from Examples

It was rather surprising that such a simple approach can go this far, and this case study should be shared in the community. – Reviewer 1

Synthesizing Regular Expressions from Examples for Introductory Automata Assignments
Mina Lee*, Sunbeom So*, Hakjoo Oh
GPCE 2016
Best Paper Award
paper · poster (english) · poster (korean) · code
proposal (for senior project)

AlphaRegex: Synthesizer for Regular Expressions from Examples


Regular expressions?

A regular expression is a way to express a text pattern.


For instance, *.txt represents all text files. A wildcard notation * is used for any sequence of characters; hence, any files with the extension .txt regardless of their names are selected by the regular expression above. You may use it to find all text files in certain directories.

Due to the expressiveness of regular expressions, they are widely used for various text manipulation tasks such as text extraction, classification, etc.

Introductory automata assignments?

Regular expressions are taught in many computer science courses such as automata theory.

a, b, c, … → 0, 1

In practice, all available characters more or less are used to construct regular expressions. However, in automata classes, students who learn regular expressions for the first time usually only deal with the binary alphabet (i.e. 0 and 1) for simplicity.

Program synthesis?

Program synthesis is the task of automatically generating a program that meets user intent.


This is a sample question in assignments for introductory automata courses.

Find a regular expression for the following language.
L = {w ∈ {0, 1}* | Strings have exactly one pair of consecutive 0s}


For students, they may not understand the meaning of the problem, do not know where to start, or give up after a few trials, not to mention they may come up with wrong answers.

Even for instructors, it takes a while to solve the problem. Moreover, they may wonder whether their regular expression is an optimal solution.

Scenario 1. When students do not understand the problem, instructors often use examples to clarify the meaning of the problem:

Student: I don’t understand what ‘one pair of consecutive 0s’ in the question means.

Instructor: It means ‘00‘ appears only one time in each string represented by the regular expression. For instance, correct strings would be ‘00‘, ‘1001′, ’010010′ and so on, whereas ’01’, ’11’, and ‘000’ would be wrong examples of the strings.

Scenario 2. Students often use examples and counterexamples to check whether their answers are correct.

Problem Statement

In lieu of having instructors demonstrating the meaning of problems to each student or wondering if their answers are optimal ones, let’s come up with:

A method for synthesizing regular expressions
from a set of simple, basic examples given by users.

00, 1001, 010010 (O)
01, 11, 000 (X)


In a nutshell, while regular expressions are widely used for lots of text applications, it is still hard for first-timers to construct them. Our method automatically finds an optimal regular expression from a set of examples they provide.

Problem Formalization


e → a ∈ Σ | ε | ∅ | e + e | e • e | e*

Regular expression problems

Students are given with a description of a regular language L. We assume that the description of the language is given by a pair (P, N) of example strings.

Positive examples: P ⊆ Σ*
Negative examples: N ⊆ Σ*

Positive examples are correct strings that must be included in the language, and negative examples are wrong strings that must be excluded from the language.

A regular expression problem: (P, N)


Basic search algorithm

Our algorithm checks all regular expressions in the order of their simplicity until it finds a solution.


Starting from the initial (i.e. simplest) regular expression, it iteratively constructs regular expressions. Because this approach results in a huge search space, we need to effectively prune out the search tree to find a solution in a reasonable time.

Pruning the search space

  1. Prune when further search does not produce any results
  2. Prune when the current candidate is subsumed by a simpler candidate

(Please check the paper for more details!)