This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple …
I had a look at the above paper that just appear.
They claim to be 3 times faster than CGAL (and TetGen andGeoGram) for
sequential 3D Delaunay and to have a good multithreading scheme.
- memory alignement.
- a table of vertices (containing coordinates + une extra field for marks)
- a table of tetrahedra’s vertices (containing 4 indices of vertices per tetra)
- a table of tetrahedra’s neighbor (containing 4 indices of tetra per tetra)
- a table of tetrahedra’s sphere coefficient (containing 4 minors of inspire test per tetra)
They manage to have all these records aligned in a way that the information needed for an object is always on a single page.
- radix sort.
the spatial sort for Brio technique is done by computing an Hilbert index first
and then using radix sort on these Hilbert indices which is faster than an actual sort.
It means that a grid resolution is fixed at the beginning and points are sorted according the the cell of the grid containing them.
This is not robust to very non uniform distribution.
- in-sphere predicate
They use Shewchuk predicates
in the sequential version they store the minors of the determinant (the in-sphere is called about 1.8 times for each sphere)
in the multi-threaded version they do not to save memory
Robustness issues are solved using static filter of Shewchuk + SoS [Edelsbunner Mucke]
- neighbor access
They store the reciprocal index to avoid our function that compute it.
if t1 is neighbor i2 of t2 and t2 is neighbor i1 of t1, then the information 4*t2+i2 is stored in t2->neighbor(i1)
then t2 and i2 are retrieved by shifting and masking
- adjacency computations
When building the adjacencies between the new tetra after an insertion
Let abc be a triangle of the boundary of the hole to be triangulated and q the new point
- The adjacency between qabc and the tetra outside the hole is easy to build
- The adjacency between qabc and qbcd is build in CGAL by turing around edge bc is the old triangulation (I think, I did not check the current code)
- In the proposed code
-they build all tetra inside the hole
- they build a table of size m^2 of tetra incident to the m^2 possible edges if m is the number of vertices on the hole boundary
- they recover the neighbouring relations from that table
- they use m=32 (average is 15) if occasionally the hole is bigger they use some slower method
- multi threading
They group a set of points of one brio level in batches according to the hilbert sort to distinguish zones in the domain
- zones are processed in different threads
- points close to the zones boundary may failed to be inserted and are delayed to a next allocation in zones