This is an old revision of the document!


Database structure

The EPD is set up as a relational database consisting of a large number of tables to allow querying the data in complex ways and to minimise the storage space required. The aim of the database structure support group is to find and maintain a table structure and database management system that is adequate and up to date for the widest possible use of the EPD. The group is maintaining and fostering links to other palaeo-environmental databases like ALPADABA, Neotoma, APD, PANGAEA or BugsCEP.


The Publishing Network for Geoscientific & Environmental Data PANGAEA offers to support the EPD


We are keeping track of developments of pollen databases and had a closer look at the Neotoma database

Review of the existing EPD table structure

Walter Finsinger, Simon Brewer, Thomas Giesecke and Basil Davis met in Aix between Mai 29 and 31. We reviewed the EPD table structure, discussed working protocols with Michelle Leydet and helped John Keltner to correct mistakes in the database. We are grateful to John, Michelle and Valérie Andrieu for advice in discussions and support. (Simon Brewer, Basil Davis, Walter Finsinger, Thomas Giesecke)

Also present at the meeting: John Keltner, Michelle Leydet

This report attempts to summarize the major discussions from the work meeting and suggests a few guidelines or protocols. None of the items are set in stone but they should serve as a base of discussions.

We first reviewed the paradox table structure (Fig. 1) that the EPD is currently held in and identified fields that have not been used, items that should be combined or items that should be added.

Fig. 1: Table structure and relationships of the most important EPD paradox tables (here imported in Access).

Table ‘Coredriv’: Was seldom used and could be deleted – the data that is present could be stored in a free text ‘notes’ field in an appropriate table.

Table ‘Section’: The information here could be better combined with another table.

Tables ‘Sitedesc’ and ‘Siteloc’ could be combined.

Table ‘Entity’: IsCore, IsSect, IsSSamp could be combined into a field or combined with Descriptor.

Table ‘Entity’: It would be good to add an identifier for a single publication that should be cited when using the dataset. The database should hold references to many publications that are describing the dataset, but when citing many records it is often only possible to cite one publication. Ideally, the person who submits data to the DB should indicate the reference to the publication that should be cited whenever the dataset is used. However, one single publication is in some cases not sufficient: (1) Sometimes a diagram is published in two parts in different publications (e.g., Late Glacial in one and Holocene in another). (2) Sometimes the C14 dates, or part of them, are published in a later publication, separate from the original publication with the pollen diagram.

Table ‘Entity’: The variables ‘IceThickCM’ and ‘C14DepthAdj’ may be deleted and the data that is present could be stored in a free text ‘notes’ field in an appropriate table.

Table ‘Descr’: This table should be reviewed in detail. At the moment it does not contain a variable describing whether or not a lake has an in- or outflow. We initially proposed to add:

  • Higherdescr–Descriptor–Description
  • LNAT———-LNIN——-lake without inflow
  • LNAT———-LNOU——-lake without outflow
  • LNAT———-LNIO——-lake without in- and outflow
  • (LNAT = natural lake)

However, we realize that these will bring duplications as e.g. a lake of fluvial or glacial origin may have an inflow or not. Therefore it seems necessary to review the list of choices in the ‘descr’ Table.

Table ‘Litholgy’: The sediment description is currently free text and therefore can not be queried. It would be helpful to add a general variable where the choice has to be made between e.g. gyttia and peat. In this way sites that went from lake to mire could be identified.

Tables containing dating information like ‘Pb210’, ‘tl’ … may be combined into a single table containing a variable that identifies the type of age determination.

Table ‘Sitedesc’: We felt it important to have a variable that describes the general vegetation around the site in the way that the variable IGCP-type does. However, the suitability of the latter variable should be reviewed. This variable could possibly also be generated through a GIS query.

Secondly we reviewed the scorpion table structure designed by John Keltner. We acknowledged that most of the things that we identified as needing change in the paradox table structure were realized in the scorpion table structure. Additionally in scorpion all look up tables were combined into a glossary table. Furthermore, the scorpion table structure is better capable of holding more proxies than pollen and LOI that may have been analysed from the same core or at the same site but a different core.

In Europe we are in a situation where two macrofossil databases are being build up that will eventually be made publicly available. We therefore recommend that the EPD adopts a table structure that can comfortably accommodate several proxies and that the European Pollen Database shall be combined with the macrofossil databases in the same tables.

Although the scorpion table structure is a desirable step forward into the future, no import tools are currently available for adding new datasets to the scorpion database. Currently, Eric Grimm and co-workers are preparing a new database for multi proxy datasets. The new version of Tilia, which was presented as a β-version during the EPD meeting, will serve as an import portal for this new database and import tools will be developed. For these reasons, we recommend that the EPD continues working with the old paradox tables until a new complete solution is available.

Protocol to help the data manager’s coordination with support group

Data entry:

Metadata

a) Minimum b) Desired c) Maximum possible

When receiving a new dataset:

Each dataset receives a tracking number which is posted under appropriate categories (e.g. received, work in progress, ready to upload) on the webpage or wiki. This number will also be used for internal tracking e.g. when the dataset goes out to support groups.

Check if minimum and desired metadata is available. If a and b = TRUE go to enter data. If a = FALSE ask the submitter for more data. If b = FALSE look in publication and/or ask for more data. If no reply or reply with limited but minimum metadata present go to enter data. Any other problem contact regional work group.

Before entering the data:

A) Check if age-depth models are present. (uncal. and cal.) If A = False create simple age depth model or/and contact age depth group.

Add author of age-depth model to database.

Make taxon harmonisation. If problem occurs ask Taxonomy group.

Accuracy-control check: After age depth models are obtained and taxonomy is harmonized - produce percentage diagram and send to author for approval. (This will give the author (or the submitter) the possibility to compare the submitted dataset with the dataset that is going to be entered into the database.

If submitter/author approves data can be imported into the original database!

If within a month time new sites have been added to the original database, update the database at Medias. – A note is added to the ‘new sites’ webpage or the wiki. Also make a new downloadable version available (if possible also one in Access).

Correcting errors:

Through his work with the EPD and GPD John Keltner has located and corrected many mistakes in the metadata for which we are very grateful. However, many mistakes or omissions in important metadata remain and efforts should be made to correct them or add additional metadata.

Every error correction needs to be documented.

Ask Taxonomy group to fix existing errors.

Metadata mistakes:

1) If available check publication and correct.

If 1) = False contact Mapping and accuracy group where appropriate or Regional contact. If Regional group is contacted cc to Mapping group.

Mistakes in taxon counts or LOI:

These errors are more serious and if possible the data contributor or author should be contacted. In cases where this is not possible contact the Mapping and accuracy group and/or Regional contact.

Contacting working groups

In the first instance only the contact person is contacted – after one week the whole group is contacted. If no reply after two weeks send question to the wiki.

Data submission checklist

  • Mandatory data = no dataset will be entered if these are not present
  • Desired data = the metadata that should be submitted and the data manager and/or work groups will try to gather that data if necessary inquire from the author
  • Additional metadata = data that is useful to have in the database but no attempt will be made to complete it if it was missing from the original submission

Mandatory data for submission:

  1. Contact person name:
  2. Contact person e-mail:
  3. Contact person address:
  4. Site Name:
  5. Site description/Depositional context (choose from list):
  6. Country:
  7. Latitude (degrees):
  8. Longitude (degrees):
  9. Elevation (m a.s.l.):
  10. Pollen count data
  11. Depth of samples

Desired data for submission:

  1. Age determination of the sediment (e.g. radiocarbon, tephra, varve age)
  2. Water depth: (is 0 if no lake)
  3. Basin size (ha): (lake: water surface; mire: unforested area)
  4. Permanent superficial inflow (stream) (y/n):
  5. Permanent superficial outflow (stream) (y/n):
  6. Coring/Sampling device:
  7. Year core/sample collected:
  8. Publication author would want users to cite when using the data (Full reference with doi number):
  9. Sample volume (cm3) and/or Sample weight (g): a value for each sample
  10. Sample thickness (cm): a value for each sample
  11. Surrounding vegetation: (vegetation-type from list(2) e.g. IGCP-type)
  12. Surrounding vegetation (free text):
  13. Regional vegetation (free text):
  14. Lithological information: (gross categories e.g. gyttia, peat from list(2) - table)
  15. Digital picture of the site
  16. Age estimates of all samples as cal. yrs BP (possibly with Age uncertainty low and high)(1)
  17. Age estimates of all samples as uncal. 14C BP (possibly with Age uncertainty low and high)(1)
  18. Age basis for chronology(1)

(1)Note: If missing these fields will be added by the age-depth-model working group. (2)Note: The lists of choices are not yet in place

Additional metadata:

  • Hydrological catchment size (ha):
  • Latitude of coring location (degrees):
  • Longitude of coring location (degrees):
  • Additional publications (relating to the site) (Full reference with doi number):
  • Loss-on-ignition (% dry weight):
  • Description of the surrounding (regional and local) vegetation
database_structure.1221670827.txt.gz · Last modified: 2015/06/25 16:07 (external edit)
Back to top
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0