Benchmarks

From RCSWiki

Jump to: navigation, search

We plan to use several applications to benchmark the RTR-JVM project.


Contents

THE VORONOI DIAGRAM


The first one is an algorithm that calculates a (discrete) approximation to the Voronoi Diagram. The Voronoi Diagram is one of the fundamental algorithms in Computational Geometry.

An excellent introduction to the topic of Voronoi diagrams can be found in the ACM Computing Surveys of September 1991: "Voronoi Diagrams - A survey of a fundamental geometric data stucture" by Franz Aurenhammer (Volume 23 Issue 3, pages 345-405). It is available on the ACM digital library at:

http://doi.acm.org/10.1145/116873.116880


Given a portion of the plane and a set of n seed points on that plane, one wants to divide the plane into n tiles, one for each seed point. The set of points in each tile will have the property that they are closer to their corresponding seed than to any other seed.

In traditional Computational Geometry problems, the input to the algorithm is the set of real x and y coordinates of the seeds. The program then produces the coordinates of the segments that make up the Voronoi diagram. See for instance the qhull package available at

http://www.qhull.org

In the late 80s and early 90s, truly massive parallel computers were available. Thinking Machines and MasPar built machines that could scale up to 64K and 16K processors respectively. The individual processors were simple and small. It was possible to connect this machines to computer screens. This encouraged interest in "pixel-per-processor" algorithms, algorithms designed to divide the work among all available processors and then display the result to the screen. A good set of examples of this kind of algorithms can be found in

"Parallel Solutions to Geometric Problems on the Scan Model of
Computation" by Gay Blelloch and James Little
A.I. Memo 952 from MIT's AI lab from February 1988

A conference paper was published by those authors in ICPP in 1988 (pp. 218-222).

The Voronoi problem can be solved, approximately, in this pixel-per-processor fashion. Keep in mind that every processor-pixel has an individual x,y (integer) coordinate that is accessible. Assume that the seeds coordinates are integer and they are in the range of the available values for the pixels.

  • 1. Assign a color to every seed
  • 2. Broadcast the locations of the seeds to all processor-pixels.
  • 3. Calculate the distance from every processor-pixel to all seeds.
  • 4. Select the closest seed and color the pixel with the corresponding color.

This approach is described in "Parallel Algorithms to find the Voronoi Diagram and the Order-k Voronoi Diagram" by Trefftz and Szakas which appeared in the proceedings of the Workshop on Massively Parallel Processing at IPDPS 2003.

The code to implement this algorithm in Java is straightforward. An implementation can be found at:

http://www.cis.gvsu.edu/~trefftzc/rtrjvm/Simple.java

This program uses a second very simple class called Point.


Several programming languages were developed to program the Massively Parallel Machines mentioned previoulsy (CMs and MasPars). C* was developed by Thinking Machines (the manufacturer of the CMs) and MPL was offered by MasPar. These programming languages were relatively easy to use. The machines were controlled from a regular workstation that would broadcast data and instructions to all the processing elements (PEs) in the parallel machine. The programming was based on having two kinds of variables:

  • "Scalar" variables: Of which there was a single instance and which resided in the memory of the controlling workstation
  • "Vector" variables: These variables were replicated in all the PEs

These programming languages are called Data Parallel Languages.

Thomas Braunl, currently at the University of Western Australia, implemented a parallel programming language and simulator called Parallaxis III. Parallaxis III could be used as a programming language on the MasPar machines but it can work as a standalone machine on Unix workstations. The Parallaxis system is described in an article in:

"Parallaxis III: Architecutre-Independent Data Parallel Processing"
By Thomas Braeunl, in IEEE Transactions on Software Engineering
Volume 26 Number 3, Pages 227-243, 2000.

The software is available from:

http://robotics.ee.uwa.edu.au/parallaxis/

A more recent programming language that shares some of the semantic and syntactic flavor of Data Parallel Languages is Unified Parallel C (UPC).

http://upc.lbl.gov/

One may ask, what is the relationship between FPGAs and Massively Parallel Machines? There are researchers who think that it is reasonable to use some of the design ideas of Masssively Parallel Machines to configure FPGAs and to use Data Parallel Programming Languages to program the resulting systems. Thus, one would exploit the parallelism that an FPGA system offers by having inside a single FPGA multiple small functional units that can work in parallel in solving a given problem. See, for example, the papers published in the 2004 Workshop on Massively Parallel Processing:

An 88-Way Multiprocessor within an FPGA with Customizable Instructions
by Raymond Hoare, Shenchih Tung and Katrina Werger

and

Implementing Cellular Automata in FPGA Logic
by Mathias Halbach and Rolf Hoffmann

and in the 2005 edition of the same Workshop:

FPGA Implementation of the Massively Parallel GCA Model
by W. Heenes, R. Hoffmann  and S. Kanthak


A version of the Voronoi program written in Parallaxis can be found at

http://www.cis.gvsu.edu/~trefftzc/rtrjvm/voronoi.pm

This is the program that was used to produce the graphics for the article published in the 2003 Workshop on Massively Parallel Processing.

String Matching


A couple of recent papers suggest that implementing String Matching algorithms in FPGAs is a reasonable endeavour:

  • "Massivley Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching" by Zhang et al. in Proceedings of the 2004 Workshop on Massively Parallel Processing and
  • "Time and area efficient pattern matching on FPGAs" by Z. Baker and V. Prasana in Proceedings of the 2004 ACM/SIGDA 12th. International Symposium on Field Porgrammable Gate Arrays This paper is available on line (if you have access to the ACM Digital Library) at: http://doi.acm.org/10.1145/968280.968312

The first paper describes an implementation of a string matching algorithm in the context of a Data Mining system. The second one is concerned with an Intrustion Detection System that examines traffic on a network to detect suspicious packets.

There are several good books on string matching algorithms:

  • "Flexible Pattern Matching in Strings - Practical on-line search algorithms for texts and biological sequences" by Gonzalo Navarro and Mathieu Raffinot published by Cambridge University Press in 2002
  • "Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology" by Dan Gusfield published by Cambridge University Press in 1999.
  • "Computing Patterns in Strings" by Bill Smyth published by Addison Wesley in 2003.

and several seminal papers in the area can be found in the ACM Digital Library:

One of the seminal papers is not available on line. It is: "Fast pattern matching in strings" by Knuth, Morris and Pratt appeared in 1977. in the SIAM Journal on Computing 6, 1, 323-350. A good presentation of the algorithm can be found in:

http://www.ics.uci.edu/~eppstein/161/960227.html 

A very complete and relevant survey can be found in "A guided tour to approximate string matching" by Gonzalo Navarro. Published in ACM Computing Surveys in March 2001. The reference in the ACM Digital Library is:

http://doi.acm.org/10.1145/375360.375365

This part will be expanded later. It will include a description of the algorithms and links to the implementation in Java or Parallaxis.

Let n be the length of the text S being searched for a pattern. Let m be the length of the pattern P.

The intuitive algorithm

The intuitive algorithm does not use any preprocessing based on the pattern. The complexity, in the worst case, is O(mn)

The following pseudocode is taken from Sara Baase's "Computer Algorithms" page 174. Here the starting position in the strings is 1.

i: current guess where the Pattern P begins in the text S j: index for S k: index for P

i=0; while (i < n) {

  i = i+1;
  j = i;
  k = 1;
  while (s[j] = p[k]) {
        if (k == m) {

return success; } else { j = j+1; k = k+1; }

   }

} return Failure;

Several other algorithms with better complexity have been devised. These faster algorithms are based on preprocessing the pattern and creating a table that will avoid re-examining parts of the text that do not need to be re-examined.



Knuth Morris Pratt

Boyer Moore

Horspool

Shift-Or (Baeza-Yates and Gannott)

Implementation on parallaxis available at

http://www.cis.gvsu.edu/~trefftzc/rtrjvm/shiftOr.pm

a similar algorithm is the Shift-And algorithm. The parallaxis implementation is available at:

http://www.cis.gvsu.edu/~trefftzc/rtrjvm/shiftAnd.pm

BNDM

Implementation on parallaxis available at

http://www.cis.gvsu.edu/~trefftzc/rtrjvm/BNDM.pm

We will implement Shift-Or, Horspool and BNDM in Java and compare their performances.

Personal tools