Benchmarks
From RCSWiki
We plan to use several applications to benchmark the RTR-JVM project.
Contents |
THE VORONOI DIAGRAM
The first one is an algorithm that calculates a (discrete) approximation to the Voronoi Diagram. The Voronoi Diagram is one of the fundamental algorithms in Computational Geometry.
An excellent introduction to the topic of Voronoi diagrams can be found in the ACM Computing Surveys of September 1991: "Voronoi Diagrams - A survey of a fundamental geometric data stucture" by Franz Aurenhammer (Volume 23 Issue 3, pages 345-405). It is available on the ACM digital library at:
http://doi.acm.org/10.1145/116873.116880
Given a portion of the plane and a set of n seed points on that plane,
one wants to divide the plane into n tiles, one for each seed point.
The set of points in each tile will have the property that they are closer
to their corresponding seed than to any other seed.
In traditional Computational Geometry problems, the input to the algorithm is the set of real x and y coordinates of the seeds. The program then produces the coordinates of the segments that make up the Voronoi diagram. See for instance the qhull package available at
http://www.qhull.org
In the late 80s and early 90s, truly massive parallel computers were available. Thinking Machines and MasPar built machines that could scale up to 64K and 16K processors respectively. The individual processors were simple and small. It was possible to connect this machines to computer screens. This encouraged interest in "pixel-per-processor" algorithms, algorithms designed to divide the work among all available processors and then display the result to the screen. A good set of examples of this kind of algorithms can be found in
"Parallel Solutions to Geometric Problems on the Scan Model of Computation" by Gay Blelloch and James Little A.I. Memo 952 from MIT's AI lab from February 1988
A conference paper was published by those authors in ICPP in 1988 (pp. 218-222).
The Voronoi problem can be solved, approximately, in this pixel-per-processor fashion. Keep in mind that every processor-pixel has an individual x,y (integer) coordinate that is accessible. Assume that the seeds coordinates are integer and they are in the range of the available values for the pixels.
- 1. Assign a color to every seed
- 2. Broadcast the locations of the seeds to all processor-pixels.
- 3. Calculate the distance from every processor-pixel to all seeds.
- 4. Select the closest seed and color the pixel with the corresponding color.
This approach is described in "Parallel Algorithms to find the Voronoi Diagram and the Order-k Voronoi Diagram" by Trefftz and Szakas which appeared in the proceedings of the Workshop on Massively Parallel Processing at IPDPS 2003.
The code to implement this algorithm in Java is straightforward. An implementation can be found at:
http://www.cis.gvsu.edu/~trefftzc/rtrjvm/Simple.java
This program uses a second very simple class called Point.
Several programming languages were developed to program the Massively Parallel
Machines mentioned previoulsy (CMs and MasPars). C* was developed by Thinking Machines
(the manufacturer of the CMs) and MPL was offered by MasPar.
These programming languages were relatively easy to use.
The machines were controlled from a regular workstation that would broadcast data
and instructions to all the processing elements (PEs) in the parallel machine.
The programming was based on having two kinds of variables:
- "Scalar" variables: Of which there was a single instance and which resided in the memory of the controlling workstation
- "Vector" variables: These variables were replicated in all the PEs
These programming languages are called Data Parallel Languages.
Thomas Braunl, currently at the University of Western Australia, implemented a parallel programming language and simulator called Parallaxis III. Parallaxis III could be used as a programming language on the MasPar machines but it can work as a standalone machine on Unix workstations. The Parallaxis system is described in an article in:
"Parallaxis III: Architecutre-Independent Data Parallel Processing" By Thomas Braeunl, in IEEE Transactions on Software Engineering Volume 26 Number 3, Pages 227-243, 2000.
The software is available from:
http://robotics.ee.uwa.edu.au/parallaxis/
A more recent programming language that shares some of the semantic and syntactic flavor of Data Parallel Languages is Unified Parallel C (UPC).
http://upc.lbl.gov/
One may ask, what is the relationship between FPGAs and Massively Parallel Machines? There are researchers who think that it is reasonable to use some of the design ideas of Masssively Parallel Machines to configure FPGAs and to use Data Parallel Programming Languages to program the resulting systems. Thus, one would exploit the parallelism that an FPGA system offers by having inside a single FPGA multiple small functional units that can work in parallel in solving a given problem. See, for example, the papers published in the 2004 Workshop on Massively Parallel Processing:
An 88-Way Multiprocessor within an FPGA with Customizable Instructions by Raymond Hoare, Shenchih Tung and Katrina Werger
and
Implementing Cellular Automata in FPGA Logic by Mathias Halbach and Rolf Hoffmann
and in the 2005 edition of the same Workshop:
FPGA Implementation of the Massively Parallel GCA Model by W. Heenes, R. Hoffmann and S. Kanthak
A version of the Voronoi program written in Parallaxis can be found at
http://www.cis.gvsu.edu/~trefftzc/rtrjvm/voronoi.pm
This is the program that was used to produce the graphics for the article published in the 2003 Workshop on Massively Parallel Processing.
String Matching
A couple of recent papers suggest that implementing String Matching algorithms
in FPGAs is a reasonable endeavour:
- "Massivley Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching" by Zhang et al. in Proceedings of the 2004 Workshop on Massively Parallel Processing and
- "Time and area efficient pattern matching on FPGAs" by Z. Baker and V. Prasana in Proceedings of the 2004 ACM/SIGDA 12th. International Symposium on Field Porgrammable Gate Arrays This paper is available on line (if you have access to the ACM Digital Library) at: http://doi.acm.org/10.1145/968280.968312
The first paper describes an implementation of a string matching algorithm in the context of a Data Mining system. The second one is concerned with an Intrustion Detection System that examines traffic on a network to detect suspicious packets.
There are several good books on string matching algorithms:
- "Flexible Pattern Matching in Strings - Practical on-line search algorithms for texts and biological sequences" by Gonzalo Navarro and Mathieu Raffinot published by Cambridge University Press in 2002
- "Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology" by Dan Gusfield published by Cambridge University Press in 1999.
- "Computing Patterns in Strings" by Bill Smyth published by Addison Wesley in 2003.
and several seminal papers in the area can be found in the ACM Digital Library:
- "A new approach to text searching" by Ricardo Baeza-Yates and Gasto Gonnet, appeared in CACM in October of 1992. The reference in the ACM Digital Library is: http://doi.acm.org/10.1145/135239.135243
- "A Fast String Searching Algorithm" by Robert S. Boyer and J.S. Moore appeared in CACM in October 1977. The reference in the ACM Digital Library is: http://doi.acm.org/10.1145/359842.359859
- "Fast text searching: allowing errors" by Sun Wu and Udi Manber appeared in CACM in October of 1992. The reference in the ACM Digital Library is: http://doi.acm.org/10.1145/135239.135244
- "Practical Fast Searching in Strings" by Nigel Horspool published in Software - Practice and Experience, vol 10, 1980. Available on line from: http://www.cs.uvic.ca/~nigelh/Publications/stringsearch.pdf
One of the seminal papers is not available on line. It is: "Fast pattern matching in strings" by Knuth, Morris and Pratt appeared in 1977. in the SIAM Journal on Computing 6, 1, 323-350. A good presentation of the algorithm can be found in:
http://www.ics.uci.edu/~eppstein/161/960227.html
A very complete and relevant survey can be found in "A guided tour to approximate string matching" by Gonzalo Navarro. Published in ACM Computing Surveys in March 2001. The reference in the ACM Digital Library is:
http://doi.acm.org/10.1145/375360.375365
This part will be expanded later. It will include a description of the algorithms and links to the implementation in Java or Parallaxis.
Let n be the length of the text S being searched for a pattern. Let m be the length of the pattern P.
The intuitive algorithm
The intuitive algorithm does not use any preprocessing based on the pattern. The complexity, in the worst case, is O(mn)
The following pseudocode is taken from Sara Baase's "Computer Algorithms" page 174. Here the starting position in the strings is 1.
i: current guess where the Pattern P begins in the text S j: index for S k: index for P
i=0; while (i < n) {
i = i+1;
j = i;
k = 1;
while (s[j] = p[k]) {
if (k == m) {
return success; } else { j = j+1; k = k+1; }
}
} return Failure;
Several other algorithms with better complexity have been devised. These faster algorithms are based on preprocessing the pattern and creating a table that will avoid re-examining parts of the text that do not need to be re-examined.
Knuth Morris Pratt
Boyer Moore
Horspool
Shift-Or (Baeza-Yates and Gannott)
Implementation on parallaxis available at
http://www.cis.gvsu.edu/~trefftzc/rtrjvm/shiftOr.pm
a similar algorithm is the Shift-And algorithm. The parallaxis implementation is available at:
http://www.cis.gvsu.edu/~trefftzc/rtrjvm/shiftAnd.pm
BNDM
Implementation on parallaxis available at
http://www.cis.gvsu.edu/~trefftzc/rtrjvm/BNDM.pm
We will implement Shift-Or, Horspool and BNDM in Java and compare their performances.
