Program Synthesis meets Machine Learning

E0 307 Program Synthesis meets Machine Learning

Instructors: Chiranjib Bhattacharyya, Deepak D'Souza, and Sriram Rajamani (MSR, Bangalore).

Teaching Assistants: P. Habeeb and Rekha Pai.

Link to Piazza class: Piazza class (register with code "e0307").

Background

Program synthesis has its roots in formal methods and programming languages. The goal of program synthesis is to automatically generate a program (from a space of possible programs) which satisfies a specification written in logic. The problem has its roots in a paper by Church in 1957, and the initial breakthroughs were made by Buchi and Landweber (1969) and M O Rabin (1972), who showed that the synthesis problem is decidable for specifications written in certain logics. However, the complexity of the algorithms was too high (non-elementary to exptime) to be useful in practice.

Recent formulations have made synthesis more practical. In his PhD thesis, Solar-Lezama formulated synthesis as "sketching" [2], a process where part of the program is given by the user as a template and the synthesizer merely fills in "holes" in the sketch using search. Another recent formulation, due to Sumit Gulwani uses input/output examples (rather than formulas) as specifications [3], and uses clever search algorithms to generate appropriate programs. Sparked by these two works, there has been a resurgence or work in program synthesis in the past decade. There is an annual Sygus competition [4] where practical tools compete every year.

Recently there is an interesting interplay developing between program synthesis and machine learning. Machine learning uses continuous optimization methods to learn models that minimize a specified loss function, whereas program synthesis uses discrete combinatorial search to learn programs that satisfy a specification. While program synthesis produces interpretable programs, which can be formally verified, machine learning deals with noise in the inputs more gracefully. There is a rich body of recent work in combining machine learning and program synthesis to get the benefits of both approaches.

References:

Alonzo Church, Application of recursive arithmetic to the problem of circuit synthesis, Summaries of talks presented at the Summer Institute for Symbolic Logic Cornell University, 1957.
Solar-Lezama, Program Synthesis by Sketching, PhD Thesis, UC Berkeley, 2003.
Sumit Gulwani, Automating String Processing in Spreadsheets using Input-Output Examples. POPL 2011.
Sygus competition, https://sygus.org/
Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75, 2 (1987), 87-106.
Vandrager, F. Model Learning, CACM, Feb 2017.
Ashwin Kalyan, Abhishek Mohta, Alex Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples, 6th International Conference on Learning Representations (ICLR), January 2018.
Sumit Gulwani, Prateek Jain, Programming by Examples: PL meets ML, Dependable Software Systems Engineering, Published by IOS Press, 2019.
A Iyer, M Jonnalagedda, S. Parthasarathy, A. Radhakrishna, S. Rajamani, Synthesis and Machine Learning for Heterogeneous Extraction, to appear in PLDI 2019.
Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018, Programmatically Interpretable Reinforcement Learning. In ICML 2018.
Alex Graves, Greg Wayne, Ivo Danihelka, Neural Turing Machines, 2014.
Scott Reed, Nando de Freitas, Neural Programmer Interpreters, 2016.
Pearmutter & Siskind, Reverse mode AD in a functional framework, TOPLAS 2008.
Elliott, The simple essence of automatic differentiation, ICFP 2018.
Alur, Bodik, Juniwal, Martin, Raghothaman, Seshia, Singh, Solar-Lezama, Torlak, Udupa: Syntax-guided synthesis, FMCAD 13
Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari: Oracle guided component-based program synthesis, ICSE 2010

Course Outline

This course will have two parts:

In this part, we will cover the theory and fundamentals of program synthesis, including the recent formulations to restrict synthesis using templates, and reformulate synthesis as a search problem. We will also cover black-box formulations of synthesis, starting with the classic Angluin's algorithm [5] to its modern variants [6]. We will teach this part in a structured manner through prepared lectures.
In this part, we will read and discuss recent papers exploring the combination of machine learning and program synthesis. Specific topics include
- Using ML to Rank Programs and Prune Search Space for Program Synthesis [7, 8]
- Combining ML and synthesis [9]
- Neural program induction [10, 11]
- Automatic differentiation [12, 13]

Pre-Requisites

We require students to have good knowledge in programming. We also require students to have taken an introductory course in Machine Learning (regression, classification, deep learning etc). We will not require prior exposure to program synthesis or formal methods. We will supply the necessary background in Part 1. Students will need to show initiative in reading papers for Part 2, and leading discussions in Part 2. Students will also need to do both theory and implementation for the project.

Grading

There will be three assignments in the first part of the course (one for each of the main components). The first part of the course will have a written exam around mid-semester. Students will be required to do a project in combining program synthesis and machine learning. Ideas for projects will be inspired by papers discussed in Part 2 above. We will use public benchmarks such as the ones for SyGus competitions or the data used by papers in Part 2 to evaluate projects. The overall course grade will be determined by performance in the assignments and presenatations, mid-semester exam, and the project, with weightage as below.

Weightage for evaluation:

Assignments + Presentation: 30%
Mid-semester Exam: 30%
Project: Final Exam: 40%

Books and other material

Sumit Gulawani, Oleksandr Polozov, and Rishab Singh: Program Synthesis, Now Publishers, 2017.

Assignments

Assignment 1 due on Wed 29 Jan 2020.
Assignment 2 due on Mon 24 Feb 2020.
Assignment on Prose due on Fri 6th Mar 2020.

Meeting schedule 3:30pm, Mondays and Wednesdays, in Room 117, CSA.

Lecture Schedule and Slides

Date Lecturer Topic Slides

Jan 6, 2020 Sriram Part a: Introduction and overview. Course introduction, tutorial intro to synthesis, high level comparisons with ML, and detailed and "hands-on" tutorial using the Sketch tool.
Part b: Hoare logic: Bridging Programs and Formulas. Slides 1(a), Slides 1(b)

Jan 9, 2020 Sriram SAT Solvers
Part a: Intro to SAT, NNF, CNF, Basic DPLL with BCP and ULP.
Part b: Modern DPLL with conflict clauses and non-chronological backtracking. Slides 2(a), Slides 2(b)

Jan 13, 2020 Sriram SMT Solvers
Part a: First order logic, theories, models, satisfiability modulo theories, common theories
Part b: Decision procedures for EUF, DIA, LIA, LRA. Slides 3(a), Slides 3(b)

Jan 16, 2020 Sriram SMT Solvers
SMT solvers: Nelson-Oppen method for combining theories, Handling disjunctions using SAT Slides 4

Jan 20, 2020 Sriram Synthesis using constraint solvers
Part a: CEGIS (Counter-Example Guided Inductive Synthesis) and OGIS (Oracle Guided Inductive Synthesis)
Part b: Quantifiers and model finding in SMT solvers Slides 5(a), Slides 5(b)

Jan 22, 29, Feb 3, 5 2020 Deepak Enumerative and Deductive Search Slides

Feb 06, 2020 Deepak Model Learning based on Angulin's algorithm Slides

Feb 10, 12 2020 Chiranjib Semantic Parsing

Feb 25, 27, Mar 2, 2020 Chiranjib Hidden Markov Models, Viterbi decoding, model learning Book Chapter by Jurafsky and Martin

Date	Lecturer	Topic	Slides
Jan 6, 2020	Sriram	Part a: Introduction and overview. Course introduction, tutorial intro to synthesis, high level comparisons with ML, and detailed and "hands-on" tutorial using the Sketch tool. Part b: Hoare logic: Bridging Programs and Formulas.	Slides 1(a), Slides 1(b)
Jan 9, 2020	Sriram	SAT Solvers Part a: Intro to SAT, NNF, CNF, Basic DPLL with BCP and ULP. Part b: Modern DPLL with conflict clauses and non-chronological backtracking.	Slides 2(a), Slides 2(b)
Jan 13, 2020	Sriram	SMT Solvers Part a: First order logic, theories, models, satisfiability modulo theories, common theories Part b: Decision procedures for EUF, DIA, LIA, LRA.	Slides 3(a), Slides 3(b)
Jan 16, 2020	Sriram	SMT Solvers SMT solvers: Nelson-Oppen method for combining theories, Handling disjunctions using SAT	Slides 4
Jan 20, 2020	Sriram	Synthesis using constraint solvers Part a: CEGIS (Counter-Example Guided Inductive Synthesis) and OGIS (Oracle Guided Inductive Synthesis) Part b: Quantifiers and model finding in SMT solvers	Slides 5(a), Slides 5(b)
Jan 22, 29, Feb 3, 5 2020	Deepak	Enumerative and Deductive Search	Slides
Feb 06, 2020	Deepak	Model Learning based on Angulin's algorithm	Slides
Feb 10, 12 2020	Chiranjib	Semantic Parsing
Feb 25, 27, Mar 2, 2020	Chiranjib	Hidden Markov Models, Viterbi decoding, model learning	Book Chapter by Jurafsky and Martin

Research Papers for Discussion

Big Code: Learning programs and program snippets from large amounts of code in repositories.
- Predicting program properties from Big Code Link. This is the CACM overview article and there are many related papers students can study and present (from Martin Vechev's group, etc).
- Code2vec.
- Ehran Yahav's Blog.
Some ways of combining synthesis and ML
- DeepCoder Link.
- Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples
- Learning Natural Programs from a Few Examples in Real-Time Link.
Constructing programs equivalent to ML models
- Verifiable Reinforcement Learning via Policy Extraction Link
- Programmatically Interpretable Reinforcement Learning Link.
- Synthesis and Machine Learning for Heterogeneous Extraction Link.
Semantic Parsing
- Learning Executable Semantic Parsers for Natural Language Understanding Link
  (More recent papers are available on the topic)
Model Learning
- CACM paper on Model Learning: Link.
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. Link

Exam Schedule:

Mid-semester Exam: 3:30pm Monday 9th March 2020.
Project Evaluation: TBA