Learning Disjunctions of Predicates

TL;DR — How do you learn a boolean formula that separates positive from negative examples when the formula is a disjunction of predicates? This paper provides efficient algorithms with provable guarantees for learning disjunctions from labeled examples and membership queries.

The Problem

Many program properties can be expressed as disjunctions of simple predicates — for example, x > 0 OR y < 5. Given a set of positive examples (points satisfying the formula) and negative examples (points that do not), the goal is to learn the target disjunction. The challenge is that the space of possible disjunctions is exponentially large: with n candidate predicates, there are 2ⁿ possible disjunctions.

Learning these from examples alone can require exponentially many samples. The key question is: can we do better if the learner is allowed to actively query the oracle?

The Key Idea

The paper studies learning in models where the learner can ask membership queries ("is this specific point positive or negative?") and equivalence queries ("is my current hypothesis correct? if not, give me a counterexample"). By combining these two types of queries, the learner can efficiently navigate the exponential search space.

The algorithm works by building up the disjunction predicate-by-predicate:

Start with an empty hypothesis (no predicates).
Use membership queries to probe specific points and identify which predicates are relevant.
Propose a hypothesis disjunction based on the collected evidence.
If the hypothesis is incorrect, receive a counterexample from the equivalence oracle.
Analyze the counterexample to discover a new predicate that must be added to the disjunction.
Repeat until the hypothesis matches the target concept.

This incremental approach avoids enumerating all 2ⁿ possible disjunctions and instead discovers the relevant predicates one at a time.

Interactive Demo

Predicate Learning Visualizer

The target concept is a disjunction of simple predicates over a 2D plane. The learner uses membership queries and proposes hypotheses to discover the target. Green points are positive (satisfy the target); red points are negative.

Target concept: (hidden until solved)

Queries

Hypotheses

Accuracy

Status

Ready

2D plane with labeled examples. Shaded regions show the learned hypothesis.

How It Works

Query-Based Learning

The learning model in this paper extends the classical PAC (Probably Approximately Correct) framework by allowing the learner to interact with the environment. Rather than passively receiving random examples, the learner can strategically choose which points to query. This active learning approach is critical: for disjunctions, passive learning requires exponentially many examples, while query-based learning can succeed with polynomially many queries.

Predicate Enumeration

The algorithm operates over a known set of candidate predicates P = {p₁, ..., p_n}. The target concept is a disjunction p_i1 OR p_i2 OR ... OR p_ik for some unknown subset of size k. The learner does not know which predicates appear in the target, but it can evaluate any predicate on any point. This structure allows the learner to systematically test candidate predicates by querying strategically chosen points.

Disjunction Construction

When the learner receives a positive counterexample (a point that should be positive but the current hypothesis labels as negative), it knows the target must include at least one predicate that covers this point. By testing each candidate predicate on the counterexample, the learner identifies a new predicate to add to its hypothesis. After at most k such rounds (where k is the number of predicates in the target), the hypothesis converges to the target concept.

Key insight: Each counterexample reveals information about at least one missing predicate. Since the target contains at most k predicates, the algorithm terminates after at most k equivalence queries, with a polynomial number of membership queries per round.

Results

The paper establishes provably efficient algorithms for learning disjunctions of predicates under several learning models:

The number of equivalence queries is bounded by the number of predicates in the target disjunction.
The total number of membership queries is polynomial in the number of candidate predicates and the size of the target.
The algorithm handles various classes of predicates, including linear predicates and predicates from structured domains.
The results extend to learning over different domains, addressing the complexity of learning disjunctions when the predicate class has specific structural properties.

The algorithms achieve exact learning (not just approximate) with query complexity polynomial in the relevant parameters, demonstrating that active queries fundamentally reduce the sample complexity compared to passive learning for this concept class.

Citation

@inproceedings{bshouty2017learning, title={Learning Disjunctions of Predicates}, author={Bshouty, Nader H. and Drachsler-Cohen, Dana and Vechev, Martin and Yahav, Eran}, booktitle={Conference on Learning Theory (COLT)}, year={2017} }