[plt-scheme] efficient set implementation

=?GB2312?B?SmVucyBBeGVsIFM/Z2FhcmQ=?= jensaxel at soegaard.net
Mon Sep 4 15:19:56 EDT 2006


Chongkai Zhu skrev:

> Is there any existing efficient set implementation (like RedBlackSet) in Scheme, PLT in particular?

The Galore set implementation is reasonably efficient.
However! If all your sets are small the simple minded list based
implementation will probably be faster.

----------------------------------------------------------------------
TIME COMPLEXITIES
----------------------------------------------------------------------

The following table shows the worst case time complexity. The O( ) has
been omitted due to space considerations.

                  RB-Set    RB-Bag  Leftist-Heaps  List-Set  List-Bag
find-min            1         1            1           1         1
delete-min          1         1            1           1         1
get, member?      log n      log n         n           n         n
insert            log n      log n       log n         n         n
union              n+m        n+m       log(n+m)      n+m       n+m
elements            n          n           n           1         n
delete            log n      log n       log n         n         n
delete-all        log n      log n       log n         n         n
size                n          n           n           n         n

bag,  list->set  n log(n)                           n log(n)
heap, list->bag              n log(n)                          n log(n)
set,  list->heap                           n

The operations: empty?, empty, singleton, and select are all O(1).


Note that the complexity of union is n+m (just as the AVL trees).
Any insights on improving this is welcome.

Attached is two simple benchmarks. I think they measure the same,
but you better check your self.


Btw - there is also an implementation of integer sets in MzLib:

<http://download.plt-scheme.org/doc/352/html/mzlib/mzlib-Z-H-21.html#node_chap_21>

-- 
Jens Axel S?gaard
-------------- next part --------------
;;; BENCHMARK SETS

; AVL

(require (planet "sets.scm" ("oesterholt" "datastructs.plt" 1 0)))

;;; DATA CONSTRUCTION (to prepare for the benchmark)

; Create a list of overlapping intervals
; (0 2 ... 2N-1) (N ... 3N-1) ... ((K-1)N ... (K+1)N-1)
(define N 300)
(define K 300)

(require (lib "42.ss" "srfi"))

(define intervals (list-ec (: k K)
                           (list-ec (: n  (* k N) (* (+ k 2) N))
                                    n)))

; The same intervals, but now as sets.

(define (list->set xs)
  (let loop ([s (set = <)]
             [xs xs])
    (if (null? xs)
        s
        (loop (set+ s (car xs)) (cdr xs)))))

(define sets 
  (map (lambda (xs) (list->set xs))
       intervals))


;;; LIST -> SET

(define lists-with-duplicates
  (map (lambda (l) (append l l))
       intervals))

(time
 (begin
   ; conversion from list to set
   (map (lambda (xs) (list->set xs))
        lists-with-duplicates)
   'list->set))


;;; LIST OF SETS -> UNION 

(time
 (begin
   ; union of a list of sets
   (foldl set-union (set = <) sets)
   'union))

;;; LIST OF SETS -> INTERSECTION

; intersection between sets with overlap

(time
 (begin
   (let loop ([sets sets])
     (if (or (null? sets) (null? (cdr sets)))
         'intersection
         (begin
           (set-intersection (car sets) (cadr sets))
           (loop (cdr sets)))))))


; N=300, K=300

; cpu time: 1141 real time: 1141 gc time: 0
; list->set

; cpu time: 114781 real time: 115094 gc time: 13824
; union

; cpu time: 672 real time: 672 gc time: 157
; intersection
-------------- next part --------------
;;; BENCHMARK SETS

; GALORE 2

(require (planet "set.ss" ("soegaard" "galore.plt" 2 0)))

(require (lib "42.ss" "srfi")
         (lib "67.ss" "srfi"))

;;; DATA CONSTRUCTION (to prepare for the benchmark)

; Create a list of overlapping intervals
; (0 2 ... 2N-1) (N ... 3N-1) ... ((K-1)N ... (K+1)N-1)
(define N 300)
(define K 300)

(define intervals (list-ec (: k K)
                           (list-ec (: n  (* k N) (* (+ k 2) N))
                                    n)))

; The same intervals, but now as sets.

(define sets 
  (map (lambda (xs) (list->set integer-compare xs))
       intervals))


;;; LIST -> SET

(define lists-with-duplicates
  (map (lambda (l) (append l l))
       intervals))

(time
 (begin
   ; conversion from list to set
   (map (lambda (xs) (list->set integer-compare xs))
        lists-with-duplicates)
   'list->set))


;;; LIST OF SETS -> UNION 

(time
 (begin
   ; union of a list of sets
   (foldl union (empty integer-compare) sets)
   'union))

;;; LIST OF SETS -> INTERSECTION

; intersection between sets with overlap

(time
 (begin
   (let loop ([sets sets])
     (if (or (null? sets) (null? (cdr sets)))
         'intersection
         (begin
           (intersection (car sets) (cadr sets))
           (loop (cdr sets)))))))

; N=300, K=300

; cpu time: 1375 real time: 1390 gc time: 203
; list->set

; cpu time: 25141 real time: 25250 gc time: 10734
; union

; cpu time: 375 real time: 375 gc time: 234
; intersection


More information about the plt-scheme mailing list