COLT 2017

Learning Disjunctions of Predicates

Nader H. Bshouty, Dana Drachsler-Cohen, Martin Vechev, Eran Yahav

TL;DR — How do you learn a boolean formula that separates positive from negative examples when the formula is a disjunction of predicates? This paper provides efficient algorithms with provable guarantees for learning disjunctions from labeled examples and membership queries.

The Problem

Many program properties can be expressed as disjunctions of simple predicates — for example, x > 0 OR y < 5. Given a set of positive examples (points satisfying the formula) and negative examples (points that do not), the goal is to learn the target disjunction. The challenge is that the space of possible disjunctions is exponentially large: with n candidate predicates, there are 2n possible disjunctions.

Learning these from examples alone can require exponentially many samples. The key question is: can we do better if the learner is allowed to actively query the oracle?

The Key Idea

The paper studies learning in models where the learner can ask membership queries ("is this specific point positive or negative?") and equivalence queries ("is my current hypothesis correct? if not, give me a counterexample"). By combining these two types of queries, the learner can efficiently navigate the exponential search space.

The algorithm works by building up the disjunction predicate-by-predicate:

  1. Start with an empty hypothesis (no predicates).
  2. Use membership queries to probe specific points and identify which predicates are relevant.
  3. Propose a hypothesis disjunction based on the collected evidence.
  4. If the hypothesis is incorrect, receive a counterexample from the equivalence oracle.
  5. Analyze the counterexample to discover a new predicate that must be added to the disjunction.
  6. Repeat until the hypothesis matches the target concept.

This incremental approach avoids enumerating all 2n possible disjunctions and instead discovers the relevant predicates one at a time.

Interactive Demo

Predicate Learning Visualizer

The target concept is a disjunction of simple predicates over a 2D plane. The learner uses membership queries and proposes hypotheses to discover the target. Green points are positive (satisfy the target); red points are negative.

Target concept: (hidden until solved)
Queries
0
Hypotheses
0
Accuracy
--
Status
Ready
x y

2D plane with labeled examples. Shaded regions show the learned hypothesis.

How It Works

Query-Based Learning

The learning model in this paper extends the classical PAC (Probably Approximately Correct) framework by allowing the learner to interact with the environment. Rather than passively receiving random examples, the learner can strategically choose which points to query. This active learning approach is critical: for disjunctions, passive learning requires exponentially many examples, while query-based learning can succeed with polynomially many queries.

Predicate Enumeration

The algorithm operates over a known set of candidate predicates P = {p1, ..., pn}. The target concept is a disjunction pi1 OR pi2 OR ... OR pik for some unknown subset of size k. The learner does not know which predicates appear in the target, but it can evaluate any predicate on any point. This structure allows the learner to systematically test candidate predicates by querying strategically chosen points.

Disjunction Construction

When the learner receives a positive counterexample (a point that should be positive but the current hypothesis labels as negative), it knows the target must include at least one predicate that covers this point. By testing each candidate predicate on the counterexample, the learner identifies a new predicate to add to its hypothesis. After at most k such rounds (where k is the number of predicates in the target), the hypothesis converges to the target concept.

Key insight: Each counterexample reveals information about at least one missing predicate. Since the target contains at most k predicates, the algorithm terminates after at most k equivalence queries, with a polynomial number of membership queries per round.

Results

The paper establishes provably efficient algorithms for learning disjunctions of predicates under several learning models:

The algorithms achieve exact learning (not just approximate) with query complexity polynomial in the relevant parameters, demonstrating that active queries fundamentally reduce the sample complexity compared to passive learning for this concept class.

Citation

@inproceedings{bshouty2017learning, title={Learning Disjunctions of Predicates}, author={Bshouty, Nader H. and Drachsler-Cohen, Dana and Vechev, Martin and Yahav, Eran}, booktitle={Conference on Learning Theory (COLT)}, year={2017} }