ZCOMP: Fast Compression of Hypergraphs into ZDDs

Takahisa Toda

The University of Electro-Communications
Graduate School of Informatics and Engineering
Associate Professor

Last update: 2019-03-01


Table of Contents

SPECIFICATION
BASIC NOTION
DOWNLOAD
FILE FORMAT FOR SET FAMILIES
FILE FORMAT FOR ZDDS
HOW TO COMPILE
USAGE
Remarks
STANDARD OUTPUT
NOTICE
REFERENCES

SPECIFICATION

The program zcomp constructs a ZDD from a data file that represents a set family and outputs the ZDD as a data file. Conversely, the program unzcomp receives a data file that represents a ZDD and outputs a decompressed set family.

BASIC NOTION

A hypergraph is a pair H=(V,E) of a set V and a family E of subsets of V, where the sets in E are called hyperedges. A Zero-suppressed BDD (ZDD) is a well-accepted data structure for hypergraphs(, where we identify hypergraphs with set families).

CU Decision Diagram Package Release 2.5.0 is used in our program to manipulate ZDDs. For detailed information, see README in the directory cudd-2.5.0 and also on-line document.

DOWNLOAD

FILE FORMAT FOR SET FAMILIES

The requirements for an input file of zcomp and an output file of unzcomp are:

  • Entries must be sorted in increasing order and separated by at least one blank character.
  • No duplicated entries appear in the same row.
  • Each entry must be a nonzero positive number.
  • Empty line is allowed, which corresponds to empty set.
  • The lenght of a row must be less than the value indicated by the macro MAX_ROWLEN.

For example:


2 4 7 
7 8 
9 
9 10 

Given the data above, our program compresses it into a ZDD and outputs the following data.


2: (~10?1:1)
3: (~9?0:2)
4: (~8?0:1)
5: (~7?3:4)
6: (~7?0:1)
7: (~4?0:6)
8: (~2?5:7)

FILE FORMAT FOR ZDDS

A sequential list of branch instructions I_{m}, I_{m-1},...,I_{1}, where

  • I_{k} has the form n_{k}: (~v_{k}?l_{k}:h_{k}), which represents an internal ZDD node.
  • n_{k}: the unique index (hexadecimal number) assigned to a ZDD node,
  • v_{k}: the label (decimal number) of a ZDD node,
  • l_{k}: the unique index (hexadecimal number) assigned to the LO-child,
  • h_{k}: the unique index (hexadecimal number) assigned to the HI-child.
  • The node indices of the top and the bottom nodes must be 1 and 0, respectively.

This is called Knuth format in our code because it is introduced in his book. See pp.206-207 in "The Art of Computer Programming, Volume 4a (2011)".

HOW TO COMPILE

Compile ZDD library (i.e. CUDD) first and then compile our code in the top directory.


$ tar zxvf zcomp-1.2.0.tar.gz
$ cd zcomp-1.2.0/cudd-2.5.0
$ make
$ cd ..
$ make

    

The above method is basically sufficient in Linux. For MacOSX and Cygwin on Windows, the compilation of CUDD may fail. In this case, it may be good to modify Makefile in the directory cudd-2.5.0 so that for MacOSX, all lines concerning XCFLAGS are commented out, and for Cygwin on Windows, the line of XCFLAG in the "Windows 95/98/NT/XP/Vista/7 with Cygwin" section is only uncommented (and other lines are commented). If compilation fails in spite of Linux, select approapriate XCFLAG according to your environment such as architechture and gcc version.

USAGE


Usage: ./zcomp [option] input-file [output-file]
  Compression Method
  -z  zcomp: bottom-up with sorting (default)
  -n  naive

  Read Order, available only for naive method // this means the order for input to be read
  -o  original order (default)
  -r  random order
  -f  forward order (lexicographic order)
  -b  backward order

  Output File Format
  -k  Knuth file format (default)
  -d  dot file format

  Verification
  -c  comparison with native method


$./unzcomp input-file [output-file]

    

If output-filename is not given, zcomp and unzcomp do not produce any datafile.

ZDDs can be visualized: execute zcomp with -d option and convert the output dot file into image file using Graphviz. See Graphviz - Graph Visualization Software.


$ cat data/sample-rand.dat
7 8 
2 4 7 
9 10 
9 

$ ./zcomp -d data/sample-rand.dat sample.dot
$ dot -Tpng sample.dot -o sample.png

    

The created DOT file and image file in PNG format.

Remarks

zcomp and unzcomp do not maintain reference counts of ZDD nodes! Hence, an error will occur if either the number of nodes exceeds the limit size or the memory space available in a computer is exhausted.

zcomp and unzcomp access ZDD library through bdd_interface.h, in which all functions for ZDD manipulation are defined. Perhaps, you can easily change to other ZDD libraries, if you like, by modifying bdd_interface.h and Makefile in the top directory.

In general, compresssion of data just aims at reducing data sizes. However, as for ZDD, compression is not an end in itself and can be considered as a first step to later processing. An important thing is that zcomp makes it easier to efficiently manipulate set families because ZDD offers vairous basic operations for set families and these operations do without explictly decomposing ZDDs, thereby time complexity only depends on ZDD sizes. This means that whatever size an input set family is, one can efficiently apply set-theoretical operations over ZDDs, as long as intermediate ZDD sizes are in a permissible range during computation. In order to implement this approach, i.e., application of ZDD operations after compression, it is sufficient to insert such operations in the main function of zcomp.c.

STANDARD OUTPUT

If the macros MISC_LOG, TIME_LOG, SIZE_LOG are defined in Makefile, the following information will be shown after execution. The meaning of each line is described as a comment following //.


$ ./zcomp data/sample-rand.dat 
date  Fri May 23 21:09:42 2014        //  executed time
program ZCOMP-1.2.0                   //  program name and version
package CU Decision Diagram Package Release 2.5.0 //  used package
input data/sample-rand.dat            //  input file name
method  zcomp: bottom-up with sorting //  compression method
format  knuth                         //  output file format

generator_type  default               //  selected random number generator
first_value 1787880323                //  the first generated random number
max_value 2147483647                  //  the maximum value of random numbers
seed  1400846982                      //  seed used to produce a random number

maxval  10                            //  the maximum value of items
#row  4                               //  the number of rows
#entry  8                             //  the number of entries with repetition

GC  disabled                          //  garbage collection was disabled
time(COMP)  0.00  sec                 //  compression time
|zdd| 9                               //  the number of nodes in a ZDD
#sets 4                               //  the number of sets represented by a ZDD
uncmp_size  8                         //  the uncompresssed size of a ZDD

    

NOTICE

A huge number of sets may be generated in unzcomp and disk space may be exhausted. To avoid disk overflow, take measure such as using ulimit command.

REFERENCES