Quarry: "Assumption-Free" Data Management

Demonstration servers:

Medication Nomenclature (password required)

(817 Signatures, 744526 Resources, 10401670 Descriptors)

CMOP Forecasts and Hindcasts

(currently 24 Signatures, 66404 Resources, 522280 Descriptors; in a previous experiment, 86 Signatures, ~1M Resources, 7.5M Descriptors)

DIESEL integration experiment

(7 Signatures, 69 Resources, 875 Descriptors)

Through a collaboration with Portland State University researchers, we are addressing the problem of "bootstrapping" a data management application: How does one proceed from heterogeneous, unfamiliar data sources to useful knowledge, in hours rather than weeks or months?.

Quarry ScreenshotQuarry Screenshot

Conventional data management solutions (cf. RDBMS), are characterized by top-down, rigid designs, requiring significant up-front investment: schema design, formal requirements gathering, feature triage. The concept of a dataspace encourages a different approach: begin by immediately providing simpler, baseline services over any and all data sources, rather than restricting the design to advanced services implemented over only convenient and familiar data sources.

Quarry performance over a competitive RDF management systemQuarry performance over a competitive RDF management system

The Quarry system address the bootstrapping problem of dataspaces: given a set of datasources that one knows nothing about, what is the shortest path to gaining useful knowledge? Quarry helps manage those data over which very few assumptions hold: there is no schema available, the data need not be relational, there are no obvious constraints or patterns to exploit, and there may be millions of items with no obvious way to "start small."

To use Quarry, data sources are decomposed into an "assumption-free" data model -- a set of (resource, property, value) triples. These triples are then processed and indexed -- automatically -- to provide efficient query and browse services through a simple API. The Quarry platform also provides an interactive web application for profiling your data -- testing assumptions, assessing quality and "cleanliness", and, more generally, improving one's understanding.

Publications

Quarrying Dataspaces: Schemaless Profiling of Unfamiliar Information Sources, Bill Howe, David Maier, Nicolas Rayner, James Rucker, Workshop on Information Integration Methods, Architectures, and Systems (IIMAS 2008)
Smoothing the ROI Curve for Scientific Data Management Applications, Bill Howe, David Maier, Laura Bright, Third Biennial Conference on Innovative Data Systems Research (CIDR 2007)

Emergent Semantics: Towards Self-Organizing Scientific Metadata, Bill Howe, Kuldeep Tanna, Paul Turner, David Maier
International Conference on Semantics for a Networked World (SFNW 2004), co-located with SIGMOD 2004.

AttachmentSize
howe_maier_rayner_rucker_quarry.pdf353.48 KB

Events

« October 2008 »
SuMTuWThFSa
1234
567891011
12131415161718
19202122232425
262728293031

User login

Search CMOP

Research Feature

CMOP scientists study plankton blooms in the Columbia River. Read More

Profile

Lydie Herfort is a post-doctoral fellow and aquatic microbiologist. Read More

Outreach

Visual Data: Picture This! is a class offered this fall to high school students. Learn More

Director's Welcome

CMOP is an outstanding opportunity to address regional and national priorities in ocean policy, and beyond.
More ...