Program Synthesis

My penchant for program synthesis stems largely from, I believe, its enormous applicability to other fields such as biology, database, data mining, human-computer interaction, machine learning, security, software engineering, and so much more.

Herewith, I leave my super subjective notes and comments on each material to remember and easily refer to them at any point in the future. Hence, this page is constantly updated as I continue exploring the field of programming languages, especially program synthesis!

DISCLAIMER: This post does not claim to be comprehensive or exhaustive :)


What is Program Synthesis?


Program Synthesis, Microsoft Research

Program synthesis is the task of automatically discovering an executable piece of code given user intent expressed using various forms of constraints such as input-output examples, demonstrations, natural language, etc. Program synthesis has direct applications for various classes of users in the technology pyramid: end users, students and teachers, software developers, and algorithm designers.

Program Synthesis, Wikipedia

Program synthesis is a special form of automatic programming that is most often paired with a technique for formal verification. The goal is to construct automatically a program that provably satisfies a given high-level specification. In contrast to other automatic programming techniques, the specifications are usually non-algorithmic statements of an appropriate logical calculus.


General Guide to Program Synthesis


A great starting point to get a sense of what program synthesis is.

Dimensions in Program Synthesis
PPDP 2010, Sumit Gulwani

  • User Intent
    • Logical Specifications
      • + Precise and succinct
      • – Difficult for end-users
    • Input-Output Examples
      • + Simple and less chance of error
        (with interaction, possible to achieve full functional specification)
      • – Hard to decide selection criterion and the number of examples
    • Traces
      • + More informational and sometimes easier than input-output examples
      • + Easier for synthesis
      • – Users need to know how to perform the task
    • Natural Language
      • + Best for end-users
      • – Ambiguous
        (with interaction, possible to resolve ambiguity)
  • Search Space
    • Logic
      • + Precise and succinct
      • Good for program synthesis
    • Grammars
      • + Has many applications
        (e.g. regular expressions, DFAs, NFAs, context-free grammars, etc.)
    • Programs
      • – Large search space
      • Restrict the choices of
        • Operators
        • Control structure
  • Search Technique
    • Brute-force Search
      • + Simplest
      • – Too expensive, requiring optimizations and pruning search space
      • Applications: functional program (a system that searches for desired functional programs by generating type-correct programs and checking them whether they satisfy given specifications)
    • Version Space Algebra
      • + Relatively simple, efficient
      • Lau et.al. extended Mitchell’s version space to version space algebra
        • Basic idea of version space: to maintain a set of all possible answers and iteratively refine the set as more data becomes available
        • Basic idea of version space algebra: to build up a complex version space by composing simpler version spaces instead of defining a single, large version space
      • Applications: repetitive robot programs, shell scripts, text-editing programs, imperative Python programs, etc.
    • Machine Learning Based Techniques
      • Probabilistic Inference
        • Model a program as a graph and use belief propagation to infer states and/or instructions
        • Applications: program verification (inferring abstract states given the program instructions), program synthesis (inferring both concrete states and instructions that satisfy a given set of input-output examples), etc.
      • Genetic Programming
        • + Can be used when conventional techniques fail, able to provide approximate solutions
        • An evolutionary algorithm-based methodology to find computer programs tailored to an user-defined task
        • Applications: mutual exclusion algorithms, fixing bugs in imperative programs, etc.
    • Logical Reasoning Based Techniques
      • Constraint Generation: generating a logical constraint (i.e. synthesis constraint)
        • Invariant-based
        • Path-based
        • Input-based
      • Constraint Solving: reducing the generated constraint to a corresponding SAT/SMT constraint to let off-the-shelf constraint solvers solve the problem
        • Reducing Second-order Unknowns to First-order Unknowns
        • Universal Quantifier Elimination

More detailed introduction to inductive programming.

Inductive Programming Meets the Real World
CACM 2015, Sumit Gulwani, Jose Hernandez-Orallo, Emanuel Kitzelmann, Stephen Muggleton, Ute Schmid, Ben Zorn

Programming by Examples (and its applications in data wrangling)
Marktoberdorf Lecture Notes, Sumit Gulwani


A good summary of PBE-based technologies (in MS software) and industrial point of view.

Program Synthesis in the Industrial World: Inductive, Incremental, Interactive
SYNT 2016, Oleksandr Polozov, Sumit Gulwani, and the rest of the PROSE team

  • Programming by example (PBE)
    • A subfield of program synthesis whose specification on a target program is given as input-output examples (i.e. constraints on the output)
    • It has been existed since 1980s; however, it is only the last decade when mass-market applications appeared
  • Mass-market applications
    • FlashFill (2013): a system for automatic synthesis of string transformation scripts in spreadsheets
    • FlashExtract (2014): a system for automated data extraction from text files
    • FlashMata (2015): a framework for automatic generation of domain-specific inductive synthesizers
  • Challenges in deploying program synthesis for industrial applications
    • Ambiguity resolution
      • Problem: ambiguous input-output examples
        (users easily lose their trust after a few attempts)
      • Solution: ranking
        • Specified manually by a DSL designer
        • Learned from a dataset
    • Incremental synthesis
      • Problem: dealing with new examples
      • Solution: incremental synthesis
        • Given a set of solutions from the past user interactions and the new constraints for the same learning task, filter the set down to only those programs that satisfy the new constraints
        • Consider combining background processing, “fast-track” tactics, and novel data structures for storing and processing DSLs
    • Interaction
      • + Another option for ambiguity resolution
      • + One way of establishing trust in systems and creating predictive experiences
      • – Hard to define and implement problems
    • Performance-minded engineering
      • – Difficult to be responsive due to a huge search space

Some publications of technologies incorporated in MS software

FlashMeta: A Framework for Inductive Program Synthesis
OOPSLA 2015, Alex Polozov, Sumit Gulwani

FlashExtract: A Framework for Data Extraction by Examples
PLDI 2014, Vu Le, Sumit Gulwani

  • Contributions
    • Suggested a general synthesis framework for DSLs to extract relevant data from semi-structures documents using examples
    • Presented an interaction model for end-users to give examples
    • Designed an inductive synthesis algorithm
    • Allowed examples to be in any DSL for data extraction
  • Evaluation
    • Domains: text files, webpages, and spreadsheets
    • Benchmarks: 75 documents
    • Average number of examples: 2.36
    • Average time taken per field: 0.84 seconds

Automating String Processing in Spreadsheets using Input-Output Examples
POPL 2011, Sumit Gulwani


Manipulating recursive functions, data structures, etc.

Synthesizing Transformations on Hierarchically Structured Data
PLDI 2016, Navid Yaghmazadeh, Chris Klinger, Isil Dillig, Swarat Chaudhuri

Synthesizing Data Structure Transformations from Input-Output Examples
PLDI 2015, John Feser, Swarat Chaudhuri, Isil Dillig

  • Summary
    • User intent: input-output examples
    • Search space: functional programs
      (programs in a λ-calculus with algebraic types and recursion)
    • Search technique: brute-force search with optimizations
  • Contributions
    • Focused on recursive data structures to effectively prune out a search space
  • Evaluation
    • Problem set: 40 data structure manipulation tasks
    • Runtime: 0.43 seconds (median)
    • Number of examples: 5 or fewer (for 75% of the benchmarks)
  • Approach
    • System
      • Input: a set of input-output examples
      • Output: a minimal-cost closed program that satisfies the examples
    • Techniques
      • Inductive generalization
      • Deductive reasoning
      • Best-first enumerative search

Program Synthesis
Empowering Other Fields


Biology

Saurabh Srivastava
Founder and CEO of 20n


Database

Alvin Cheung
Assistant Professor in the Department of Computer Science & Engineering at the University of Washington


Data Mining

Using Program Synthesis for Social Recommendations
CIKM 2012, Alvin Cheung, Armando Solar-Lezama, Samuel Madden


Machine Learning

Unsupervised Learning by Program Synthesis
NIPS 2015, Kevin Ellis, Armando Solar-Lezama, Joshua B. Tenenbaum

sk_p: a neural program corrector for MOOCs
CoRR 2016, Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, Regina Barzilay

Latte: A Language, Compiler, and Runtime for Elegant and Efficient Deep Neural Networks
PLDI 2016, Leonard Truong, Rajkishore Barik, Ehsan Totoni, Hai Liu, Chick Markley, Armando Fox, Tatiana Shpeisman


Software Engineering

Synthesizing Framework Models for Symbolic Execution
ICSE 2016, Jinseong Jeon, Xiaokang Qiu, Jonathan Fetter-Degges, Jeffrey S. Foster, Armando Solar-Lezama

3 Comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s