icture of Erez 

Projects

 

Erez Petrank

 

 

 

 

 

 

 

 

Home

Publications

Courses

Projects

Committees

Collaborations

Patents

Students

Links

Bio

Hebrew

 

 

 

 

 

      

 

 

Content:

Wait-Freedom Made Practical,
Theoretical Foundations for Memory Management,

Advances in Concurrent Data Structures (an Ongoing Project),

Memory Management with Progress Guarantees (an Ongoing Project),

Memory Management Verification,
Real-Time Concurrent Garbage Collection,
Memory Management with Sliding-views,
Modern Reference-Counting Techniques,
Advanced Mostly Concurrent Collection
Parallel Compaction and Copying,
Local Heaps,

A Production JVM with the DLG collector

Concurrent Zero-Knowledge,
Quantifying Knowledge Complexity,
Studies in Multiparty Computation,
Non-Interactive Zero-Knowledge,
Identity Escrow,
On the Hardness of Approximations,
Data Structures Construction and Analysis
.

 

 

Wait-Freedom Made Practical

With the proliferation of multicore systems, the design of concurrent algorithms and concurrent data structures to support them has becomes critical. Wait-free data structures provide the very basic and natural “non-starvation” progress guarantee, assuring that each thread always makes progress when given enough CPU cycles. However, prior to this project, wait-free algorithms were considered difficult to design and too costly to be used in practice. Only the weaker lock-free guarantee, which allows almost all threads to starve, was achievable in practice. In this project, we have completely revolutionized the state-of-the-art by making wait-freedom available in practice. First, we developed efficient wait-free algorithms for various concurrent data structures. Second, we provided a methodology for creating efficient wait-free data structures, and finally, we provided a simulation that takes a lock-free data structure in a normalized structure and automatically makes it wait-free. This latter transformation enables efficient wait-freedom for many data structures whose simpler lock-free versions are known today.

Theoretical Foundations for Memory Management

Memory Management has been studied in depth for the last few decades. The vast majority of the relevant literature is focused on building more efficient and more reliable systems. But a robust theory of these constructions is scarce. In this project, we develop theoretical foundations for memory foundation. In the first paper we investigated the complexity of finding the optimal placement of objects (or code) in the memory, in the sense that this placement reduces the cache misses to the minimum. Reducing cache misses is crucial to program efficiency and so a solid theoretical foundation is desirable. We have shown that this problem is one of the toughest amongst the interesting algorithmic problems in computer science. In particular, suppose one is given a sequence of memory accesses and has to place the data in memory so as to minimize the number of cache misses for the given sequence. We show that if P≠NP, then this problem is not solvable in polynomial time. Furthermore, it is not possible to efficiently approximate the optimal solution, even up to a very liberal approximation ratio.

A second area in memory management whose theoretical aspects we addressed is fragmentation. Allocation and de-allocation of objects during program execution may cause fragmentation and foil the program’s ability to allocate objects. Compaction can eliminate fragmentation, but is too costly to be run frequently. Instead, partial compaction is often used to defragment parts of the heap and avoid space blow up. Partial compaction reduces some of the existing fragmentation at an acceptable cost. In the second and third papers we study the effectiveness of partial compaction and provide the first rigorous lower and upper bounds on its effectiveness in reducing fragmentation at a low cost.

 

 

Advances in Concurrent Data Structures (an Ongoing Project)

In this project we attempt to advance the state-of-the-art for practical concurrent data structures by addressing major existing open questions in the area. We mention two items in this project. This is an on-going project.

The first study conducted under this agenda was the development of the first lock-free balanced tree. Lock-free (a.k.a. non-blocking) data structures provide a progress guarantee, ensuring that at least one of the program threads will make progress. Lock-free balanced trees were no available in the literature for many years. The first lock-free tree (with no rebalancing mechanism) appeared in the literature in 2010. In 2012 we developed the first lock-free balanced tree, specifically, a B+tree. The B+tree data structure has important practical applications, and is used in various storage-system products. This was the first design of a lock-free, dynamic, balanced tree, that employed standard compare-and-swap operations.

In the second study we addressed iterators. An iterator that traverses the data structure items is a highly desirable interface in practice that often exists for sequential data structures but is missing from (almost all) concurrent data-structure implementations. In this paper we introduced a technique for adding a linearizable wait-free iterator to a wait-free or a lock-free data structure that implements a set. We used this technique to implement an iterator for the wait-free and lock-free linked-lists and for the lock-free skip-list.

 

Memory Management with Progress Guarantees (an Ongoing Project)

Lock-free (a.k.a. non-blocking) data structures provide a progress guarantee, ensuring that at least one of the program threads will make progress. Various lock-free algorithms exist in the literature. Many of them lack treatment for memory management, assuming objects are not reused or assuming the existence of a garbage collector that reclaims objects for reuse. However, the literature does not provide a garbage collector that preserves the lock-freedom property of a running program. Ad-hoc manual memory management techniques such as hazard-pointers or drop-the-buck can be used, but they are difficult to properly use and they reduce performance notably. Therefore, handling memory management of dynamic non-blocking data structures remains an important open question.

In the first paper we deal with formalizing the problem. We first formalize the notions of progress guarantees using linear temporal logic (LTL). We then study the interaction between programs with progress guarantees and the underlying system (e.g., compilers, runtimes, operating systems, and hardware platforms). We propose a means to argue that an underlying system supports lock-freedom. A composition theorem asserts that lock-free algorithms running on lock-free supporting systems retain bounded lock-freedom for the composed execution.

In the second paper we present a novel manual memory management technique for lock-free data structures, called Drop the Anchor. This method significantly reduces the overhead associated with the memory management while reclaiming objects even in the presence of thread failures. We demonstrate this memory management scheme on the common linked-list data structure. Using extensive evaluation, we show that Drop the Anchor significantly outperforms hazard pointers, the widely used technique for non-blocking memory management.

The difficulty, and partial existing solutions for building a garbage collector that supports lock-freedom have been posed in an invited talk in ISMM/MSPC 2012.

 

 

Memory Management Verification

Memory managers, especially ones employing garbage collectors are notoriously hard to verify, due to their low- level interaction with the underlying system and the general difficulty in reasoning about reachability in graphs. Several papers have presented verified collectors, but either the proofs were hand-written or the collectors were drastically simplified and could not be used on practical applications. In this project we provided the first mechanically proved memory management that could be used with real-world C# benchmarks. We presented two mechanically verified memory managers consisting of x86 assembly language instructions and macro-instructions, annotated with preconditions, postconditions, invariants, and assertions. We used the Boogie verification generator and the Z3 automated theorem prover to verify this assembly language code mechanically. We also provided measurements comparing the performance of the verified collector with that of the standard Bartok collectors on off-the-shelf C# benchmarks, demonstrating the competitiveness performance of the verified memory managers.

 

 

Real-Time Concurrent Memory Management

Historically, real-time systems were written in low-level code, limiting the real-time application to small simple programs. Recently, various implementations of real-time systems supporting high-level languages were presented. We take this technology a step further by providing concurrent real-time garbage collectors. Such collectors allow very high responsiveness, as the garbage collection is executed on separate cores, allowing the program to respond very quickly to events, with no interruption. The main challenge is in de-fragmenting the heap, moving objects while the application is accessing them concurrently. The first two papers present and study three possible alternatives: Stopless, Clover, and Chicken, all implemented in the Microsoft Bartok compiler for C#, showing pauses at the micro-seconds level, i.e., two orders of magnitude shorter than state-of-the-art real-time collectors. All three collectors are lock-free, guaranteeing application progress (and in particular, not using any locks).

The third paper present a compiler method to reduce the overhead of the barriers, thus reducing the garbage collection overhead and improving application throughput. This method is also interesting for regular (not real-time) collectors. The fourth paper deals with stack scanning, which typically creates the most noticeable pause with on-the-fly collectors. The paper presents a method to break the stack scanning work into small pieces that can be executed incrementally and concurrently, shortening further the pause times that the application must take. Stack scanning is lock-free, making it a perfect fit for the real-time collectors presented herein.

Memory Management with Sliding-Views

Many concurrent garbage collectors attempt to capture the view of the heap at one specific point in time, in spite of the heap being modified concurrently while the collector executes. This virtual snapshot view of the heap makes the collector easier to design and prove. We propose a different method, in which the collector work with a sliding view of the heap rather than a snapshot. The sliding view is a view of the heap in which the heap is read within a short interval, but not at a single point in time. Since the heap changes during this interval, the view is not a snapshot and may be thought of as "inconsistent". Nevertheless, we show that such a view can be used to reclaim unreachable objects safely. Sliding views collectors do not need to stop all the threads at a specific point in time to read their local variables. Thus, they allow extremely short pause times. We demonstrate this method in two papers, one implementing it with a reference-counting collector and the other with a mark-sweep collector.

Reference-counting has traditionally been considered unsuitable for multi-processor systems. According to conventional wisdom, the update of reference slots and reference-counts required atomic or synchronized operations. In the first paper, we demonstrated that this is not the case by presenting a novel reference-counting algorithm suitable for a multi-processor system that does not require any synchronized operation in its write barrier (not even a compare-and-swap type of synchronization). A second novelty of this algorithm is that it allows the elimination of a large fraction of the reference-count updates, thus, drastically reducing the traditional reference-counting overhead. The paper included a full proof of the algorithm showing that it is safe (does not reclaim live objects) and live (eventually reclaims all unreachable objects).   These techniques yield a non-obtrusive and efficient collector with pause times of around 1ms for standard benchmarks running on an SMP.

A second paper exploited the above developed techniques to obtain a novel mark and sweep on-the-fly garbage collector with similar short pause times.  In addition to being an effective mark and sweep on-the-fly collector standing on its own, this collector can also be used as a backup collector (collecting cyclic data structures) for the sliding-views reference-counting collector. These two algorithms fit perfectly sharing the same allocator, a similar data structure, and a similar JVM interface.

 

Modern Reference-Counting Techniques

The study and use of reference counting on a multiprocessor has not been as extensive and thorough as the study of concurrent and parallel tracing collectors. The reason was that reference counting had three problems: inadequacy for use with multiprocessors, high overhead on reference-count updates, and inability to reclaim unreachable cyclic structures. The first two issues were dealt with by the sliding-views reference-counting collector, which constitute the first paper in this project. We first proposed the update coalescing method, which eliminated most of the count updates, and then we presented a write-barrier that eliminated the need for synchronization over the pointer updates for multiprocessors. Finally, we presented an on-the-fly version of the algorithm built on sliding-views.  

Going beyond the above important improvements to the basic reference-counting algorithm, we turned to further improving reference-counting collectors. The first goal was to use generations with reference counting. We started by proposing the use of reference counting in a generational collector such that the old generation is collected via reference counting and the young generation is collected via tracing.  This configuration yields a nice improvement over non-generational reference counting.  In a second work we proposed a non-conventional use of generations, denoted age-oriented collection, in which, unlike generational collectors, the entire heap is always collected, but, like generational collectors, each generation is treated differently during the collection.  This achieves a further substantial improvement over the standard generational collector. 

Our second goal was to improve cycle collection for reference counting. In a third work we proposed a new algorithm for concurrent cycle collection. We improved over the efficiency of previous collectors, provided a simpler cycle collector adequate for running with the sliding-views reference-counting collector. Measurements show that cycle collection is advantageous when used with generations, most effectively with the age-oriented collector. This paper together with the age-oriented paper present our current understanding of the best way to run reference counting.  It should be applied on old objects only and the generational hypothesis should be used according to the age-oriented collector.

Finally, we investigated the use of prefetching to further improve the efficiency of the reference-counting collector. We proposed potential prefetching opportunities for a collector the follows the update coalescing methodology and report an implementation of a collector that employs such prefetching. The proposed prefetch instructions were inserted into the Jikes reference counting collector obtaining an average reduction of 8.7% of the memory management overheads. 

Advanced Mostly Concurrent Collection

In this project we designed and implemented a collector facing SMP demands. This collector was incorporated into the IBM production JVM. This collector builds on the mostly concurrent garbage collector proposed by Boehm et al. with several novel  algorithmic ideas, improving the efficiency and shortening the pause times of the mostly concurrent collector substantially. We also made it parallel and incremental; we employed concurrent low-priority background GC threads to take advantage of processor idle time. Compared to the mature, well-optimized, parallel, stop-the-world mark-sweep IBM production collector, our collector prototype reduces the maximum pause time from 161ms to 46ms while only losing 11.5% throughput when running the SPECjbb2000 benchmark on a 600 MB heap on an 8-way PowerPC with 1.1 GHz processors.

Parallel Compaction and Copying for an SMP

Typically, mark-sweep and reference-counting collectors are susceptible to fragmentation. Because the garbage collector does not move objects during the collection, the heap becomes more and more fragmented.  Fragmentation increases the complexity of allocation and may prevent allocation of large objects.  To ameliorate this problem, compaction algorithms are used to group the live objects together. 

The first paper proposes the Compressor, which is the most efficient compaction algorithm known today.  It requires only a single heap pass plus a single pass on a small (mark-bits) table. The Compressor has a version that can run on several parallel compaction threads and a concurrent version that can run concurrently with the program threads.  This work extends the one described in the second (earlier) paper, which proposes a parallel heap compaction algorithm appropriate for multiprocessors.  That algorithm demonstrates high scalability when running in parallel but is also extremely efficient when running single-threaded on a uniprocessor.  A copying collector compacts the heap during the collection.  In the second paper we studied a parallel copying collector, and in the last one we made some observations on a concurrent copying collector.

Local Heaps

In this project we presented a memory management scheme for Java based on thread-local heaps. Assuming most objects are created and used by a single thread, it is desirable to free the memory manager from redundant synchronization for thread-local objects. Therefore, in our scheme each thread receives a partition of the heap in which it allocates its objects and in which it does local garbage collection without synchronization with other threads. We dynamically monitor to determine which objects are local and which are global. Furthermore, we suggested using profiling to identify allocation sites that almost exclusively allocate global objects, and allocate objects at these sites directly in a global area.  This scheme substantially reduces the overall garbage collection times and eliminates most long pauses.


 

A Production JVM with the DLG Collectors

In these project we designed and implemented an on-the-fly garbage collector based on the algorithm of Doligez, Leroy and Gonthier, denoted DLG. An on-the-fly collector is a collector that runs concurrently with the program threads and does not need to stop the program threads simultaneously.  Such collectors achieve very low pause times on a multiprocessor. In the first paper we reported extending and adapting DLG for Java (e.g., adding support for weak references) and for modern multiprocessors without sequential consistency, and adding performance improvements (e.g., keeping track of the objects remaining to be traced).  The performance advantage for our collector over a traditional stop-the-world collector increases as the number of threads increase; moreover, it provides uniformly low response times.  In the second paper we reported an incorporation of generations into the said collector.  The incorporation is non-trivial since an on-the-fly collector avoids explicit synchronization with the program threads.  To the best of our knowledge, this is the first such incorporation reported in the literature. Comparing our on-the-fly collector with and without generations, it turns out that the generational collector performs better for most applications.

 

Concurrent Zero-Knowledge

Zero knowledge proofs are a fascinating concept in the foundation of Cryptography. They allow conveying the proof of a statement without yielding any information to the verifier except for the validity of the proof. This seeming contradiction of convincing without yielding too much information made zero-knowledge an important construct in modern cryptographic protocols.

A fundamental question about zero-knowledge protocols is whether their security is preserved when many instances of the protocol are executed in an asynchronous manner, denoted "concurrent zero-knowledge". We have investigated this question in a series of papers, for the standard "black-box" formulation. A first lower bound of five rounds on the communication complexity of a zero-knowledge interaction has been shown, separating concurrent zero-knowledge from other protocol compositions. Later, a strong lower bound was shown, i.e., that a concurrent zero-knowledge interaction essentially requires a logarithmic number of rounds. This turned out to be an (almost) optimal lower bound, as a matching upper bound was later discovered by a series of complementing work.

In an attempt to build concurrent zero-knowledge protocols for NP, we have drastically improved a previously known protocol, reducing it's communication complexity from linear to poly-logarithmic, by building a more accurate simulator for the original protocol and presenting a novel simulation schedule. This simulator has later been refined by Prabhakaran et al. to reduce the complexity to an almost logarithmic one, thus, essentially matching the previously obtained lower bounds.

Quantifying Knowledge Complexity

A fundamental measure for the security of a protocol is the amount of information that leaks during an interaction. The goal of this project was to rigorously quantify and then study the amount of knowledge that a party obtains during an interaction. Gaining "Knowledge" in this context was interpreted in the most basic sense of "being able to compute more". We started by proposing definitions for the knowledge complexity of an interaction. Using the fundamental simulation concept, several rigorous definitions were proposed and their relations are thoroughly studied. Next, it was formally proven that a rich hierarchy of protocols exists with increasing knowledge complexity.

Next, we studied the complexity of languages that have an interactive proof with low knowledge complexity. A series of papers with gradually improving mathematical techniques have finally led to showing that if a language has an interactive proof with  logarithmic knowledge complexity, then it is both in the AM and in the coAM complexity classes.

  • Oded Goldreich and Erez PetrankQuantifying Knowledge Complexity.   Computational Complexity, Volume 8, pp.~50--98, 1999. A preliminary version appeared in the 32nd IEEE Conference on the Foundations of Computer Science, October 1991, pp. 59-68.
     
  • Mihir Bellare and Erez PetrankMaking Zero-Knowledge Provers Efficient.   24th ACM Symp. on Theory of Computation, May 1992. pp. 711-722.
     
  • Oded Goldreich, Rafail Ostrovsky, and Erez PetrankComputational Complexity and Knowledge Complexity.   SIAM Journal on Computing, Volume 27, Number 4, pp.~1116--1141, August 1998. A preliminary version appeared in the Proceedings of the 26th ACM Symp. on Theory of Computation, May 1994. pp. 534-543.
     
  • Erez Petrank and Gabor Tardos.   On the Knowledge Complexity of NP.   Combinatorica Vol. 22, No. 1, pp. 83-121, 2002.  An extended abstract appeared in the 37th IEEE Conference on the Foundations of Computer Science, October 1996, pp. 494-503.
     

Studies in Multiparty Computation

A general formulation of most cryptographic tasks is the formulation of multiparty computation. In this setting, a set of parties wish to jointly compute a function of their inputs, while preserving security in the case that some subset of them are corrupted. In this project we study several topics in secure multiparty computation.

The first study was about the properties that can be obtained for general multiparty computation. The typical security properties considered in the literature are privacy, correctness, independence of inputs, guaranteed output delivery and fairness. Until now, all works in this area either considered an honest majority and then provided full security including output delivery and fairness, (but security can be completely compromised when the majority is not honest). A different set of protocols can work in the presence of a corrupted majority, but they do not guarantee output delivery, even with only one corrupted party. In this first work, we studied the possibility of obtaining general protocols for multiparty computation that simultaneously guaranteed security (but allowing abort) in the case that an arbitrary number of parties are corrupted and full security (including guaranteed output delivery) in the case that only a minority of the parties are corrupted.

In a second work we studied the manner in which computational assumptions are used for the secure computation of non-trivial functionalities in the setting of no honest majority. For this case, we asked whether secure protocols can use the underlying primitives (e.g., one-way trapdoor permutation) in a black-box way, or they must refer to the code that computes this primitive. All known constructions used the underlying primitive in a nonblack-box way (requiring to prove in zero-knowledge statements that relate to the primitive). In this paper, we present protocols that use only black-box access to a family of trapdoor permutations or to a homomorphic public-key encryption scheme. The result is a protocol whose communication complexity is independent of the computational complexity of the underlying primitive and whose computational complexity grows only linearly with that of the underlying primitive. This is the first protocol to exhibit these properties.

Finally, in a third work, we studied oblivious transfers. Oblivious transfer is a powerful cryptographic primitive that may be used to implement a wide variety of
other cryptographic protocols, including secret key exchange, contract signing, and secure function evaluation. It is, however, more costly in resources than standard simpler cryptographic primitives such as one way functions. In this work we consider the problem of extending oblivious transfers: given a small number of oblivious transfers from some source, can one implement a large number of oblivious transfers? We give efficient protocols for extending oblivious transfers in a black-box manner in the random oracle model. We also put forward a new cryptographic primitive which can be used to instantiate the random oracle in our constructions. Our methods suggest particularly fast heuristics for oblivious transfer that may be useful in a wide range of applications.
 

Non-Interactive Zero-Knowledge

An important setting of zero-knowledge is that of non-interactive zero-knowledge in the shared random string model, introduced by Blum Feldman and Micali. Non-interactive zero-knowledge (NIZK) protocols are useful for several higher level constructions, most notably public-key crypto systems secure against chosen-ciphertext attacks. In this work, we presented an efficient NIZK proof for any language in NP, based on general complexity assumptions. Our protocol is still the most efficient general NIZK protocol known today. In particular, the asymptotic complexity of this protocol matches the complexity of protocols that were built on specific algebraic assumptions, and the number of committed bits matches, up to a constant, the number of committed bits by the most efficient known interactive zero-knowledge proof systems for NP.

Identity Escrow

We introduce the concept of escrowed identity, an application of key-escrow ideas to the problem of authentication. In escrowed identity, one party A does not give his identity to another party B, but rather gives him information that would allow an authorized third party E to determine A's identity. However, B receives a guarantee that E can indeed determine A's identity. We consider a number of possible features of escrowed identity schemes, and describe a variety of implementations that achieve various subsets of these features. In particular, we observe that group signature schemes can be used to escrow identities, achieving most (though not all) of the desired features.

The most interesting feature we consider is separability. The escrow agency is not involved in the day to day operation of the identification system, but is only called in when anonymity must be revoked. In the extreme case, there exist identity escrow schemes in which an arbitrary party (possessing a public key) can be designated an escrow agent without any knowledge or participation on their part until they are asked to revoke someone's anonymity.

On the Hardness of Approximations

In this project, we investigated the hardness of approximation problems. Hardness is typically shown by demonstrating a hard gap for an optimization problem. We refined the complexity analysis of approximation hardness, by relating it to a new parameter called gap location. In particular, it was noted that not only the size of the hard gap matters, but also its location. Many of the results obtained so far for approximations yield the strongest analysis also with respect to this refined parameter, but some known results (e.g. MAX-k-COLORABILITY, MAX 3-DIMENSIONAL MATCHING and MAX NOT-ALL-EQUAL 3SAT) fall short of doing so. A second contribution of our work was in filling the gap in these cases by presenting new hardness reductions. Next, we presented definitions of new approximation versions of some NP-Complete optimization problems. The problems we treated were VERTEX COVER (for which we defined a different optimization problem from the one treated by Papadimitriou and Yannakakis), k-EDGE COLORING, SET SPLITTING, and a restricted version of FEEDBACK VERTEX SET and FEEDBACK ARC SET. We defined all these problems and provided hardness results with the strongest possible gap location.

  • Erez PetrankThe Hardness of Approximations : Gap Location.   Computational Complexity, Vol. 4, 1994. pp. 133-157.   A preliminary version of this paper appeared in the Second IEEE Israel Symp. on Theory of Computation and Systems, June 1993, pp. 275-284.


     

Data Structures Construction and Analysis

In this project, we studied ways to rigorously analyze the advantages and disadvantages (or impossibilities) of data structure constructs. In the first study, we considered the use of linear (or affine) transformations between two vector spaces over a finite field F for constructing universal hash functions. A universal hash function is a family of functions, such that choosing one of them at random provides some nice hashing properties. We ask how good linear functions behave in this respect and examine the expected size of the maximal bucket size as a measure for a "good hashing" property. We discover that linear functions over large fields behave quite badly whereas linear functions over small fields behave quite well, and not far from optimal.  This study involves some nice linear algebra and in particular, in the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset S of a vector space D over Z2, and consider a random linear mapping of D to a smaller vector space R. If the cardinality of S is larger than cε|R|log|R|, then with probability 1 - ε, the image of S will cover all elements in the range.   

In a second study, we check the ability of data structures to hide information that is not expected to be revealed by the structure. History independent data structures, are data structures that possess a strong security property: even if an intruder manages to get a copy of the data structure, the memory layout of the structure yields no additional information on the history of operations applied on the structure beyond the information obtainable from the content itself. A stronger security property is that even if an intruder breaks into the system several times without being noticed and records the data structure layouts, he can still obtain no additional information.  An open question was whether these two notions are equally hard to obtain. In this paper we provide a separation between the two requirements for comparison-based algorithms. We show very strong lower bounds for obtaining the stronger notion of history independence for a large class of data structures, including, for example, the heap and the queue abstract data structures. We also provide complementary upper bounds showing that the heap abstract data structure may be made weakly history independent in the comparison based model without incurring any additional (asymptotic) cost on any of its operations. (A similar result is easy for the queue.) Thus, we obtain the first separation between the two notions of history independence. The gap we obtain is exponential: some operations may be executed in logarithmic time (or even in constant time) with the weaker definition, but require linear time with the stronger definition.
 

  • Noga Alon, Martin Dietzfelbinger, Peter B. Miltersen, Erez Petrank, and Gabor Tardos. Linear Hashing .   Journal of the ACM Vol. 46,  No. 5, September 1999.   A preliminary version has appeared in the 29th ACM Symp. on Theory of Computation, May 1997.
     
  • Niv Buchbinder and Erez PetrankLower and Upper Bounds on Obtaining History Independence. Information and Computation, Vol. 204, No. 2, pp. 291-337, February 2006.
    An extended abstract appears in the Proceedings of the Advances in Cryptology - Crypto 2003, California, August 2003.