E0 307 Program Synthesis meets Machine Learning

Instructors: Chiranjib Bhattacharyya, Deepak D'Souza, and Sriram Rajamani (MSR, Bangalore).

Teaching Assistants: P. Habeeb and Rekha Pai.

Link to Piazza class: Piazza class (register with code "e0307").

Background

Program synthesis has its roots in formal methods and programming languages. The goal of program synthesis is to automatically generate a program (from a space of possible programs) which satisfies a specification written in logic. The problem has its roots in a paper by Church in 1957, and the initial breakthroughs were made by Buchi and Landweber (1969) and M O Rabin (1972), who showed that the synthesis problem is decidable for specifications written in certain logics. However, the complexity of the algorithms was too high (non-elementary to exptime) to be useful in practice.

Recent formulations have made synthesis more practical. In his PhD thesis, Solar-Lezama formulated synthesis as "sketching" [2], a process where part of the program is given by the user as a template and the synthesizer merely fills in "holes" in the sketch using search. Another recent formulation, due to Sumit Gulwani uses input/output examples (rather than formulas) as specifications [3], and uses clever search algorithms to generate appropriate programs. Sparked by these two works, there has been a resurgence or work in program synthesis in the past decade. There is an annual Sygus competition [4] where practical tools compete every year.

Recently there is an interesting interplay developing between program synthesis and machine learning. Machine learning uses continuous optimization methods to learn models that minimize a specified loss function, whereas program synthesis uses discrete combinatorial search to learn programs that satisfy a specification. While program synthesis produces interpretable programs, which can be formally verified, machine learning deals with noise in the inputs more gracefully. There is a rich body of recent work in combining machine learning and program synthesis to get the benefits of both approaches.

References:

  1. Alonzo Church, Application of recursive arithmetic to the problem of circuit synthesis, Summaries of talks presented at the Summer Institute for Symbolic Logic Cornell University, 1957.
  2. Solar-Lezama, Program Synthesis by Sketching, PhD Thesis, UC Berkeley, 2003.
  3. Sumit Gulwani, Automating String Processing in Spreadsheets using Input-Output Examples. POPL 2011.
  4. Sygus competition, https://sygus.org/
  5. Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75, 2 (1987), 87-106.
  6. Vandrager, F. Model Learning, CACM, Feb 2017.
  7. Ashwin Kalyan, Abhishek Mohta, Alex Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples, 6th International Conference on Learning Representations (ICLR), January 2018.
  8. Sumit Gulwani, Prateek Jain, Programming by Examples: PL meets ML, Dependable Software Systems Engineering, Published by IOS Press, 2019.
  9. A Iyer, M Jonnalagedda, S. Parthasarathy, A. Radhakrishna, S. Rajamani, Synthesis and Machine Learning for Heterogeneous Extraction, to appear in PLDI 2019.
  10. Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018, Programmatically Interpretable Reinforcement Learning. In ICML 2018.
  11. Alex Graves, Greg Wayne, Ivo Danihelka, Neural Turing Machines, 2014.
  12. Scott Reed, Nando de Freitas, Neural Programmer Interpreters, 2016.
  13. Pearmutter & Siskind, Reverse mode AD in a functional framework, TOPLAS 2008.
  14. Elliott, The simple essence of automatic differentiation, ICFP 2018.
  15. Alur, Bodik, Juniwal, Martin, Raghothaman, Seshia, Singh, Solar-Lezama, Torlak, Udupa: Syntax-guided synthesis, FMCAD 13
  16. Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari: Oracle guided component-based program synthesis, ICSE 2010

Course Outline

This course will have two parts:

  1. In this part, we will cover the theory and fundamentals of program synthesis, including the recent formulations to restrict synthesis using templates, and reformulate synthesis as a search problem. We will also cover black-box formulations of synthesis, starting with the classic Angluin's algorithm [5] to its modern variants [6]. We will teach this part in a structured manner through prepared lectures.
  2. In this part, we will read and discuss recent papers exploring the combination of machine learning and program synthesis. Specific topics include

Pre-Requisites

We require students to have good knowledge in programming. We also require students to have taken an introductory course in Machine Learning (regression, classification, deep learning etc). We will not require prior exposure to program synthesis or formal methods. We will supply the necessary background in Part 1. Students will need to show initiative in reading papers for Part 2, and leading discussions in Part 2. Students will also need to do both theory and implementation for the project.

Grading

There will be three assignments in the first part of the course (one for each of the main components). The first part of the course will have a written exam around mid-semester. Students will be required to do a project in combining program synthesis and machine learning. Ideas for projects will be inspired by papers discussed in Part 2 above. We will use public benchmarks such as the ones for SyGus competitions or the data used by papers in Part 2 to evaluate projects. The overall course grade will be determined by performance in the assignments and presenatations, mid-semester exam, and the project, with weightage as below.

Weightage for evaluation:

Books and other material

  1. Sumit Gulawani, Oleksandr Polozov, and Rishab Singh: Program Synthesis, Now Publishers, 2017.

Assignments

Meeting schedule 3:30pm, Mondays and Wednesdays, in Room 117, CSA.

Lecture Schedule and Slides

     Date      Lecturer Topic Slides
Jan 6, 2020 Sriram Part a: Introduction and overview. Course introduction, tutorial intro to synthesis, high level comparisons with ML, and detailed and "hands-on" tutorial using the Sketch tool.
Part b: Hoare logic: Bridging Programs and Formulas.
Slides 1(a), Slides 1(b)
Jan 9, 2020 Sriram SAT Solvers
Part a: Intro to SAT, NNF, CNF, Basic DPLL with BCP and ULP.
Part b: Modern DPLL with conflict clauses and non-chronological backtracking.
Slides 2(a), Slides 2(b)
Jan 13, 2020 Sriram SMT Solvers
Part a: First order logic, theories, models, satisfiability modulo theories, common theories
Part b: Decision procedures for EUF, DIA, LIA, LRA.
Slides 3(a), Slides 3(b)
Jan 16, 2020 Sriram SMT Solvers
SMT solvers: Nelson-Oppen method for combining theories, Handling disjunctions using SAT
Slides 4
Jan 20, 2020 Sriram Synthesis using constraint solvers
Part a: CEGIS (Counter-Example Guided Inductive Synthesis) and OGIS (Oracle Guided Inductive Synthesis)
Part b: Quantifiers and model finding in SMT solvers
Slides 5(a), Slides 5(b)
Jan 22, 29, Feb 3, 5 2020 Deepak Enumerative and Deductive Search Slides
Feb 06, 2020 Deepak Model Learning based on Angulin's algorithm Slides
Feb 10, 12 2020 Chiranjib Semantic Parsing
Feb 25, 27, Mar 2, 2020 Chiranjib Hidden Markov Models, Viterbi decoding, model learning Book Chapter by Jurafsky and Martin

Research Papers for Discussion

  1. Big Code: Learning programs and program snippets from large amounts of code in repositories.
  2. Some ways of combining synthesis and ML
  3. Constructing programs equivalent to ML models
  4. Semantic Parsing
  5. Model Learning

Exam Schedule: