Trip Report: Unstructured Grid Workshop
October 16 and 17, Boulder, CO
Materials
Take Away Messages
Action Items
My Position
The set of operations on unstructured grids we wish to support should drive the representation, rather than the other way around.
Structured grids are easy: the data model (Cartesian products of coordinate variables) immediately implies a representation (multidimensional arrays), an API (reading and writing subslabs), and an efficient implementation of the API (iterate over dimensions of the array).
Unstructured grids are more difficult: An appropriate characterization of the data model itself is not universally accepted; a general (and hopefully non-controversial) characterization is a collection of cells linked by an incidence relationship. There are many possible representations for this model:
- a multidimensional array, potentially with missing values, as found in current netCDF representations of unstructured grids
- an array of lists of nodes
- a list of polygons (or polyhedra), perhaps as OGC features.
- The quad-edge structure, and its variants
- cell tuples
- Generalized combinatorial maps
Each of these representations support different operations. The correct choice of representation should be informed by the operations one wishes to support. For example, a Mx3 array representing triangles can efficiently support extraction of the 100th to the 150th triangles. However, since the order in computational coordinates does not correspond to any physical ordering, it is not llikely that this kind of query is very useful.
I propose that the ability to derive a self-consistent subgrid using a geometric bounding box is the "killer app" for any unstructured grid representation, and is the closest analog to subslab operations on structured grids. Other important operations include aggregation over time and depth, vector calculus, and regridding. Any representation should efficiently support subsetting, and should not confound future efforts on aggregation, vector calculus and regridding.
To implement subsetting efficiently, an index is required. At the workshop, there were some very elegant ad hoc solutions presented (links to come). More traditional data structures such as quad-trees, k-d-trees, and R-trees are also relevant. We should not bind ourselves to flat array-based representations just because they are simple.
The bigger picture is this: The file-oriented model of scientific data exchange does not scale. As datasets grow, downloading a file for local analysis becomes infeasible. We will eventually be forced to move the computation to the data, rather than move the data to the computation. This tendency is well-documented in High-Energy Physics, Biology, and Astronomy; Oceanography is on the brink (witness 200 million surface nodes in QUODDY grids). I believe the appropriate first step is to allow server-side subsetting of data.
[1] Pascal Lienhardt: N-Dimensional Generalized Combinatorial Maps and Cellular Quasi-Manifolds. Int. J. Comput. Geometry Appl. 4(3): 275-324 (1994)
- Server-side subsetting received strong support as the "killer requirement."
- Balaji's GridSpec has momentum within the community. Tiling schemes (mosaics) are the focus; I didn't see much evidence that the ideas have been applied to our hybrid, 3D grids. No implementation.
- There are existing, popular standards that overlap significantly with the goals of this workshop: CGNS and XMDF
