New paper on Delaunay

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

New paper on Delaunay

Olivier Devillers-2
Hello everybody

[PDF] One machine, one minute, three billion tetrahedra

C Marot, J Pellerin, JF Remacle - arXiv preprint arXiv:1805.08831, 2018
This paper presents a new scalable parallelization scheme to generate the 3D 
Delaunay triangulation of a given set of points. Our first contribution is an efficient 
serial implementation of the incremental Delaunay insertion algorithm. A simple …

I had a look at the above paper that just appear.

They claim to be 3 times faster than CGAL (and TetGen andGeoGram) for
sequential 3D Delaunay and to have a good multithreading scheme.

- memory alignement.
They store 
      - a table of vertices (containing coordinates + une extra field for marks)
      - a table of tetrahedra’s vertices (containing 4 indices of vertices per tetra)
      - a table of tetrahedra’s neighbor (containing 4 indices of tetra per tetra)
      - a table of tetrahedra’s sphere coefficient (containing 4 minors of inspire test per tetra)
  They manage to have all these records aligned in a way that the information needed for an object is always on a single page.

- radix sort. 
  the spatial sort for Brio technique is done by computing an Hilbert index first
  and then using radix sort on these Hilbert indices which is faster than an actual sort.
  It means that a grid resolution is fixed at the beginning and points are sorted according the the cell of the grid containing them.
  This is not robust to very non uniform distribution.

- in-sphere predicate
  They use Shewchuk predicates 
    in the sequential version they store the minors of the determinant (the in-sphere is called about 1.8 times for each sphere)
    in the multi-threaded version they do not to save memory
  Robustness issues are solved using static filter of Shewchuk + SoS [Edelsbunner Mucke]

- neighbor access
  They store the reciprocal index to avoid our function that compute it.
  if t1 is neighbor i2 of t2 and t2 is neighbor i1 of t1, then the information 4*t2+i2 is stored in t2->neighbor(i1)
  then t2 and i2 are retrieved  by shifting and masking

- adjacency computations
  When building the adjacencies between the new tetra after an insertion 
  Let abc be a triangle of the boundary of the hole to be triangulated and q the new point
  - The adjacency between qabc and    the tetra outside the hole is easy to build
  - The adjacency between qabc and qbcd  is build in CGAL by turing around edge bc is the old triangulation (I think, I did not check the current code)
  - In the proposed code 
          -they build all tetra inside the hole 
          - they build a table of size m^2 of tetra incident to the m^2 possible edges if m is the number of vertices on the hole boundary
          - they recover the neighbouring relations from that table
         - they use m=32 (average is 15) if occasionally the hole is bigger they use some slower method

- multi threading
  They group a set of points of one brio level in batches according to the hilbert sort to distinguish zones in the domain
  - zones are processed in different threads
  - points close to the zones boundary may failed to be inserted and are delayed to a next allocation in zones