- Program synthesis, machine learning and information-retrieval techniques for PL
- Program analysis, abstract interpretation, verification
- Programming Languages, software engineering
Faculty member at the Computer Science Department, Technion, Israel. & CTO at Tabnine.
Research Interests
Contact Information
- yahave@cs.technion.ac.il
- Address: Computer Science Department
Technion, Haifa 3200003, Israel - Office: Taub 734
- Tel: + 972 4 829 4318
Current Projects
- PRIME: Programming with Millions of Examples (OOPSLA'12) (SAS13) (PLDI'14a) (PLDI'14b) (OOPSLA'14) (POPL'16) (VMCAI'16) (PLDI'16) (ICSE'16) (ONWARD'16a) (ONWARD'16b) (PLDI'17) (ASPLOS'18a) (ASPLOS'18b) (PLDI'18) (ACL'18) (ICML'18) (POPL'19) (ICLR'19) (PRIME code) (TRACY code) (BinSim) (Codota)
Past Projects
- Saint: Synthesis using Abstract Interpretation (TACAS'09) (EC2 '09) (SPIN '09) (POPL'10) (PODC'10) (STTT'12)
- Fender: Preserving Correctness under Relaxed Memory Models (FMCAD'10) (PLDI'11) (PLDI'12) (SA-News'12) (SAS'13) (SAS'14) (VMCAI'15) (FMCAD'15)
- Dojo: Ensuring Determinism of Concurrent Systems (SAS '10) (RV'10) (PLDI'12) (FMSD'12) (SAS'13)
- QVM: The quality virtual machine (OOPSLA'08) (PLDI'09) (ISMM'10) (OOPSLA'11)
- Paraglide: Search-Based Synthesis of Concurrent Programs (PLDI'06) (PLDI'07) (PLDI'08) (EC2 '08)
- PET: Parallelism-Enabling Toolset (OOPSLA'11-a) (OOPSLA'11-b)
- SAFE: scalable verification for heap-manipulating programs (ISSTA'06 ) (ISSTA'07 ) (POPL'08) (ISMM'08) (ISSTA'08) (TSE'08) (TOSEM'08) (Mining'11) (Exp'11)
- 3VMC: shape analysis for concurrent programs (POPL'01) (SOFTMC'03) (ESOP'03) (PLDI'04) (IGPL'08) (TOPLAS'10)
Software
- code2seq: Generating Sequences from Structured Representations of Code
- code2vec: Learning Distributed Representations of Code
- Esh: Statistical Similarity of Binaries
- TRACY: tracelet-based search and similarity
- PRIME: programming with millions of examples
- Fender: fence inference and verification under relaxed memory models
- SAFE: typestate checking in presence of aliasing
- 3VMC/TVLA: 3-valued based shape analysis for concurrent programs
Recent / Selected Publications
- ICLR'19
- code2seq: Generating Sequences from Structured Representations of Code
- POPL'19
- code2vec: Learning Distributed Representations of Code
- ICML'18
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
- ACL'18
- On the Practical Computational Power of Finite Precision RNNs for Language Recognition
- PLDI'18
- A General Path-Based Representation for Predicting Program Properties
- ICSE'18
- Programming Not Only By Example
- VMCAI'18
- Generating Tests by Example
- ASPLOS'18
- Statistical Reconstruction of Class Hierarchies in Binaries
- ASPLOS'18
- FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware
- PLDI'17
- Similarity of Binaries through Re-optimization
- CAV'17
- Synthesis with Abstract Examples
- COLT'17
- Learning Disjunctions of Predicates
- WSDM'17
- Synthesis of Forgiving Data Extractors
- ONWARD'16
- Extracting Code from Programming Tutorial Videos
- ONWARD'16
- Leveraging a Corpus of Natural Language Descriptions for Program Similarity
- KDD'16
- Lossless Separation of Web Pages into Layout Code and Data
- PLDI'16
- Statistical Similarity of Binaries
- ICSE'16
- Cross-Supervised Synthesis of Web-Crawlers
- VMCAI'16
- D3 : Data-Driven Disjunctive Abstraction
- POPL'16
- Estimating Types in Binaries using Predictive Modeling
- OBT'15
- Code Similarity via Natural Language Descriptions [slides] [talk]
- BlackHat'15
- Exploiting Social Navigation (BlackHat Asia)
- PPOPP'15
- Automatic Scalable Atomicity via Semantic Locking
- VMCAI'15
- Effective Abstractions for Verification under Relaxed Memory Models
- OOPSLA14
- Abstract Semantic Differencing via Speculative Correlation [code]
- PLDI14
- Tracelet-Based Code Search in Executables [code]
- PLDI14
- Code Completion with Statistical Language Models
- PPOPP14
- Practical concurrent binary search trees via logical ordering [code]
- PPOPP14
- Automatic semantic locking
- SAS14
- Synthesis of Memory Fences via Refinement Propagation
- SAS13
- Automatic Synthesis of Deterministic Concurrency
- SAS13
- Predicate Abstraction for Relaxed Memory Models
- SAS13
- Abstract Semantic Differencing for Numerical Programs [code]
- SAS13
- Symbolic Automata for Specification Mining
- PLDI13
- Concurrent Libraries with Foresight
- OOPSLA12
- Typestate-Based Semantic Code Search over Partial Programs [code]
- PLDI12
- Dynamic Synthesis for Relaxed Memory Models
- PLDI12
- Scalable and Precise Dynamic Datarace Detection for Structured Parallelism
- OOPSLA11
- Testing Atomicity of Composed Concurrent Operations
- OOPSLA11
- Automatic Fine-Grained Locking using Shape Properties
- OOPSLA11
- Asynchronous Assertions
- PLDI11
- Partial-Coherence Abstractions for Relaxed Memory Models
- POPL10
- Abstraction-Guided Synthesis of Synchronization
Research: Current Projects
PRIME: Programming with Millions of Examples
Programmers make extensive use of frameworks and libraries. To perform standard tasks, such as parsing an XML file or communicating with a database, programmers use standard frameworks rather than writing code from scratch. Unfortunately, a typical framework API can involve hundreds of classes with dozens of methods each, and often requires specific sequences of operations that have to be invoked on specific objects in order to perform a single task. Even experienced programmers might spend hours trying to understand how to use a seemingly simple API. The goal of PRIME is to develop techniques for programming with millions of examples by learning from available public code. PRIME combines static analysis and machine learning techniques to effectively answer semantic code-search queries, code prediction, code completion, and similarity queries. (OOPSLA'12) (SAS'13) (PLDI'14a) (PLDI'14b) (OOPSLA'14) (POPL'16) (PLDI'16) (ICSE'16) (ONWARD'16a)(ONWARD'16b) (PLDI'17) (talk video) (ASPLOS'18a) (ASPLOS'18b) (PLDI'18) (ACL'18) (ICML'18) (POPL'19) (ICLR'19) (PRIME code) (TRACY code) (codota)
Research: Past Projects
Fender: Preserving Correctness under Weak Memory Models
Modern architectures implement relaxed memory models in which memory operations may be reordered and executed nonatomically.
Special instructions called memory fences are provided to the programmer, allowing control of this behavior. To ensure correctness of many algorithms,
in particular of non-blocking ones, a programmer is often required to explicitly insert memory fences into her program. However, she must use as few fences as possible,
or the benefits of running on a relaxed architecture will be lost. Placing memory fences is challenging and extremely error prone, as it requires subtle reasoning about
the underlying memory model. The goal of this project is to automatically infer memory fences in concurrent programs, assisting the programmer with this complex task.
(FMCAD'10) (PLDI'11) (PLDI'12)
(SA-News'12) (SAS'13) (SAS'14)
(DFence Code)
Saint: Synthesis using Abstract Interpretation
Can we generate synchronization code for a program based on a high-level specification of its desired behavior?
In this project, we turn the one dimensional problem of verification under abstraction, in which only the abstraction can be modified (typically via abstraction refinement),
into a two-dimensional problem, in which both the program and the abstraction can be modified until the abstraction is precise enough to verify the program. We are interested
in modifications of a concurrent program that restrict its possible interleavings.
(TACAS'09) (EC2 '09) (SPIN '09) (POPL'10)
Dojo: Ensuring Determinism of Concurrent Systems
One of the main difficulties in parallel programming is the need to reason about possible interleavings of operations in a program. The vast number of interleavings makes this task
difficult even for small programs, and impossible for any sizeable software. To simplify reasoning about parallel programs, it is desirable to reduce the number of interleavings that
a programmer has to consider. One way to achieve that is to require parallel programs to be deterministic. The goal of this project is to investigate static and dynamic program
analysis techniques for checking and enforcing determinism in concurrent systems.
(SAS'10) (RV'10
) (PLDI'12)
QVM: The Quality Virtual Machine
The Quality Virtual Machine (QVM) is a system that uses the technology and infrastructure available in a virtual machine to improve software quality.
QVM provides a framework for easily building dynamic analyses into the VM, continuing the general trend of using additional hardware resources at runtime to help
find and/or eliminate bugs.
(OOPSLA'08) (PLDI'09) (ISMM'10) (OOPSLA'11)
Paraglide: Search-Based Synthesis of Concurrent Programs
Current practices for developing concurrent systems are rather limited. Manual constructions using low-level concurrency constructs (e.g. CAS) is the realm of experts, is extremely
error-prone and leads to significant non-repeatable effort in both the construction and the verification process. Generic higher-level constructs (e.g., transactional memory)
are also limited, and are not clearly easier to use. Analytic techniques (e.g., race detection) only address a fraction of the problems, and can only be applied after the code is
written at which point it may be broken beyond repair. The Paraglide project provides construction mechanisms and practical tools that assist the designer in building correct and
efficient concurrent systems.
(PLDI'06) (PLDI'07) (PLDI'08) (EC2 '08)
SAFE: Scalable Verification for Heap-Manipulating Programs
The goal of SAFE is to provide a scalable and flexible error-detection tool, based on typestate checking with varying degrees of cost and precision, mostly depending on the way in which
aliasing is handled. SAFE can be used for detecting violations of simple correctness properties, and for checking more sophisticated performance properties such as inefficient
use of resources.
(ISSTA'06
)
(ISSTA'07
)
(POPL'08) (ISMM'08) (ISSTA'08)
(TSE'08) (TOSEM'08) (Mining'11)
(Exp'11)
3VMC/TVLA: Shape Analysis for Concurrent Programs
3VMC is a tool for analysis and verification of concurrent systems. 3VMC is geared towards verification of concurrent software, it supports dynamic allocation of objects and threads and
does not put an a priori bound on the number of allocated objects and threads.
(POPL'01) (SOFTMC'03) (ESOP'03) (PLDI'04) (IGPL'08) (TOPLAS'10)
Research: Software
- code2seq: Generating Sequences from Structured Representations of Code
- code2vec: Learning Distributed Representations of Code
- Esh: Statistical Similarity of Binaries
- TRACY: tracelet-based search and similarity
- PRIME: programming with millions of examples
- Fender: fence inference and verification under relaxed memory models
- SAFE: typestate checking in presence of aliasing
- 3VMC/TVLA: 3-valued based shape analysis for concurrent programs
Students
Current Students
-
Gail Weiss (ICML'18) (ACL'18)
-
Yaniv David (PLDI'14) (PLDI'16) (PLDI'17) (TRACY code) (ASPLOS'18)
-
Hila Peleg (SAS'13) (VMCAI'16) (VMCAI'18) (ICSE'18)
Past Students
-
Nimrod Partush (SAS'13) (OOPSLA'14) (PLDI'16)(PLDI'17) (DIZY code)
-
Yuri Meshman (SAS'13) (SAS'14) (VMCAI'15) (FMCAD'15)
-
Shir Yadid (2016)
Extracting Code from Programming Tutorial Videos (ONWARD'16)
previously Exploiting Social Navigation at BlackHat'15
-
Meital Ben-Sinai (2016)
Program Similarity via Natural Language Descriptions (ONWARD'16)
(like2drops) previously Exploiting Social Navigation at BlackHat'15
-
Guy Gueta, with Mooly Sagiv (2014) - currently at Yahoo Research
automatic fine-grained synchronization (OOPSLA'11) , (PLDI'13) , (PPOPP'14 poster) , (ISSTA'14)
-
Karine Even, with Hana Chockler (2013)
Finding Rare Numerical Stability Errors in Concurrent Computations (ISSTA'13) -
Alon Mishne (2012) - currently at Google
programming with millions of examples (OOPSLA'12) -
Michael Kuperstein (2012) , with Martin Vechev - currently at Google
preserving correctness under relaxed memory models (FMCAD'10) (PLDI'11) (SA-News'12) -
Sharon Shoham (2012) , Postdoc - Faculty at Tel-Aviv University
programming with millions of examples (OOPSLA'12) -
Ohad Shacham (2011) , with Mooly Sagiv - currently at Yahoo Research
Verifying Atomicity of Composed Concurrent Operations (PLDI'09) (OOPSLA'11) (ISSTA'14)
Undergraduate Students
Meital Ben-Sinai, Shir Yadid. Exploiting Social Navigation. (2014) [video]
[thehackernews] [popular science] [jerusalem post] [haaretz] [wired.co.uk] [slashgear] [gizmodo AU] [dashburst] [nocamels]
[globes (heb)] [harretz (heb)] [hayadan (heb)] [calcalist (heb)] [geektime.co.il (heb)]Gai Shaked, Tsafrir Rehan,Edo Cohen (2012) .
Buffer overrun analysis, winner 2012 Amdocs Best Project Contest
Summer Interns
-
Nayden Nedev, Sofia University, Bulgaria (2010) - with Martin Vechev
Dynamic Synthesis of Memory Fences (PLDI'12) -
Nedyalko Prisadnikov, Sofia University, Bulgaria (2010) - with Martin Vechev
Dynamic Synthesis of Memory Fences (PLDI'12) -
Raghavan Raman, Rice University (2009) - with Martin Vechev
Checking and Verifying Determinism (SAS'10, RV'10 best paper) -
Ohad Shacham, Tel Aviv University, Israel (2008) - with Martin Vechev
Chameleon: Adaptive Selection of Collections (PLDI'09) , also won 2nd place in PLDI'09 SRC. -
Isil Dillig, Stanford (2007) - with Satish Chandra
The CLOSER: Automating Resource Management in Java (ISMM'08) -
Thomas Dillig, Stanford (2007) - with Satish Chandra
The CLOSER: Automating Resource Management in Java (ISMM'08) -
Greta Yorsh, Tel Aviv University, Israel (2006)
Modular Analysis
Generating Precise and Concise Procedure Summaries (POPL'08) -
Sharon Shoham, Technion - Israel Institute of Technology (2006, visit 2007)
Mining Temporal Specification
(ISSTA'07 best paper) -
Martin Vechev, Cambridge University (2005, 2006)
Synthesis of Concurrent Garbage Collectors (PLDI'06) (PLDI'07) -
Noam Rinetzky, Tel Aviv Univeristy, Israel (2005)
Misc. Shape Analysis Problems (TOPLAS'07)
Courses
Winter 2016-2017
- Advanced Course in Program Analysis and Synthesis (236347)
- Software Project - Android Applications (236504)
Spring 2016
- Theory of Compilation (236360)
- Software Project - Android Applications (236504)
Winter 2015-2016
- Advanced Course in Program Analysis and Synthesis (236347)
- Software Project - Android Applications (236504)
Spring 2015
- Theory of Compilation (236360)
- Software Project - Android Applications (236504)
Winter 2014-2015
- Advanced Course in Program Analysis and Synthesis (236347)
- Software Project - Android Applications (236504)
Spring 2014
Winter 2013-2014
Spring 2013
Winter 2012-2013
Spring 2012
Winter 2011
Spring 2011
Activities
Professional Activities
- PLDI: PLDI'17 (PC) PLDI'16 (PC), PLDI'15 (ERC), PLDI'14 (ERC), PLDI'13 (ERC), PLDI'12 (PC), PLDI SRC 2008
- CAV: CAV'16 (ERC), CAV'15 (PC),CAV'14 (PC), CAV'13 (PC)
- POPL: POPL'17 (ERC) POPL'15 (ERC), POPL'13 (ERC), POPL'11 (PC)
- OOPSLA: OOPSLA'15 (PC), OOSPLA'13 (ERC), OOPSLA 2010 (PC)
- VMCAI: VMCAI'14 (PC), VMCAI'13 (Talk), VMCAI'12 (PC)
- Static Analysis Symposium (SAS): SAS'14 (PC), SAS'12 (PC), SAS'11 (Chair), SAS'09 (PC)
- PSY: PSY'12 (co-organizer), PSY'11 (co-organizer), PSY'10 (Co-organizer), PSY'09 (co-organizer)
- APLAS: APLAS'15 (Talk) APLAS'12 (PC)
Selected Talks
Programming with Millions of Examples
Distinguished Colloquium, December 2014
Abstraction-Guided Synthesis
VMCAI 2013
[slides]Abstract Semantic Differencing for Numerical Programs
VSSE'13
[slides]Synthesis of Synchronization
2011 Summer School on Program Synthesis, August 2011
[slides]Programming with Millions of Examples
2011 Summer School on Program Synthesis, August 2011
[slides]Synthesis of Memory Fences
SVARM'11: Synthesis, Verification, and Analysis of Rich Models , April 2011
[slides]Automatic Verification and Fence Inference for Relaxed Memory Models
The Chemistry of Concurrent and Distributed Programming, Mysore Park Workshop, February 2011
[slides]Abstraction-Guided Synthesis of Synchronization
Invited talk, Haifa Verification Conference (HVC), October 2010
[slides]Practical Synthesis of Concurrent Programs
Invited talk, ARTIST Summer School in Europe 2009, September 2009
[slides]Abstraction-Guided Synthesis of Synchronization
Dagstuhl Seminar on Design and Validation of Concurrent Systems, September 2009
Dagstuhl Seminar on Software Synthesis, December 2009
[slides]Inferring Synchronization Under Limited Observability
Technion, Israel, June 2009
[slides]Phalanx: Parallel Checking of Expressive Heap Assertions
State of the Art in Testing and Analysis Day, NCSU, March 2009
[slides]Shape Analysis for Concurrent Programs
Workshop on the Verification of Concurrent Algorithms, MSR Cambridge, UK, May 2008
[slides]QVM: An Efficient Runtime for Detecting Defects in Deployed Systems
Dagstuhl Seminar on Scalable Program Analysis, Feb 2008
[slides]SAFE: scalable verification for heap-manipulating programs (tool presentation)
Dagstuhl Seminar on Scalable Program Analysis, Feb 2008
[slides]Static Specification Mining Using Automata-Based Abstractions
Technion, Israel, Feb 2008
[slides]Deriving Linearizable Fine-Grained Concurrent Objects
Technion, Israel, Feb 2008
[slides]