This is an old revision of the document!


The Publishing Network for Geoscientific & Environmental Data (PANGAEA) offers to support the EPD

The repeated offer by (M. Diepenbroek & H. Grobe) includes:

  • Long-term archiving of all EPD data in PANGAEA, with updates of new data sets or new versions of existing data sets
  • All data sets in PANGAEA get persistent identifiers (DOI) and can be published and made citable
  • Extension of the EPD data inventory by relevant data in PANGAEA (e.g. German pollen data), updates can be downloaded at any time. We could also supply a specific feed for this purpose giving the info on new pollen data sets in PANGAEA

Long-term archiving is provided free of charge and includes; the technical operation of the system; the provision of DOI for each data set; the availability of all data through portals; search engines and library catalogs; and the archiving of new data sets.

Keeping track of developments of pollen databases

In a new development the North American and Global pollen databases are being reshaped to hold a wider range of data types spanning the last 5.3 million years. The new database is called Neotoma and its initial development is funded by a grant from the U.S. National Science Foundation Geoinformatics program. As these developments are of interest to the EPD, Simon Brewer was able to attend a recent meeting of the principal investigators of this project and the following short report is drawn from his notes of the meeting.

The Neotoma system

The Neotoma database represents a number of quite important differences in the use and functioning of a pollen database. The change in the system of tables is mainly to accommodate different data types to coexist. This also has the advantage of streamlining a certain number of tables from the original database. The second major change in Neotoma is move away from local copies of databases, managed by individual data managers, to a single centralised database. This will be hosted on a server and can be accessed and interrogated via the internet. However, it is intended that individual components of the database are still be managed by a local data steward, but remotely.

Data stewards

While the database will be maintained on the central server, the different data types and regions will be managed by a data steward. This person will have the responsibility for uploading new data and maintaining and correcting the existing content. The role of the data steward is therefore not very different from a current data manager, with the exception that data is sent to a remote server, rather than into a local copy of the database.

Website

The Neotoma website will be the main access portal for the majority of users of Neotoma. The website is still under development, and should be available by the IPC meeting in Bonn. At present, users may select sites by choosing the type of data, the geographical region and/or the time window of interest. The choice is made using a shopping basket approach, which allows the selection of one or many sites. The goal is to select a standard set of the most common queries for users of the data. While the Neotoma website will be the main portal for access to this data, it has also been agreed that external applications may have access to the database. This means that existing websites and applications that use a pollen database can be adapted to use Neotoma.

Standalone version

A standalone version of the database would be made available for download for power users, i.e. those who will need to query the database in more complicated ways. This version will most probably be available as an SQL Server database that may be queried using MS Access.

Calibrated age-depth models

The new database has a number of modifications to tables containing information about dating and chronologies, allowing all types of dates to be stored in the same table. The database also accepts age-depth models in radiocarbon, calibrated and varve years. A sequence may also have more than one default chronology, but only one default per type of age control. A table is present which stores relative ages (Relative Chronology), and which may be used in age-depth models. This will contain ages attributed to a variety of controls, including archaeological time scales and geological time scales. The queries of Neotoma via the website will be, by default, in calibrated ages. In order to use information from sites that only have radiocarbon chronologies a conversion table will be used. This is not intended to replace the establishment of a age-depth model based on calibrated dates, but to allow quick exploration of the existing data.

Report from the Aix meeting of 4

Walter Finsinger, Simon Brewer, Thomas Giesecke and Basil Davis met in Aix between Mai 29 and 31. We reviewed the EPD table structure, discussed working protocols with Michelle Leydet and helped John Keltner to correct mistakes in the database. We are grateful to John, Michelle and Valérie Andrieu for advice in discussions and support. (Simon Brewer, Basil Davis, Walter Finsinger, Thomas Giesecke)

Also present at the meeting: John Keltner, Michelle Leydet

This report attempts to summarize the major discussions from the work meeting and suggests a few guidelines or protocols. None of the items are set in stone but they should serve as a base of discussions.

Revision of the database structure

We first reviewed the paradox table structure (Fig. 1) that the EPD is currently held in and identified fields that have not been used, items that should be combined or items that should be added.

Fig. 1: Table structure and relationships of the most important EPD paradox tables (here imported in Access).

Table ‘Coredriv’: Was seldom used and could be deleted – the data that is present could be stored in a free text ‘notes’ field in an appropriate table.

Table ‘Section’: The information here could be better combined with another table.

Tables ‘Sitedesc’ and ‘Siteloc’ could be combined.

Table ‘Entity’: IsCore, IsSect, IsSSamp could be combined into a field or combined with Descriptor.

Table ‘Entity’: It would be good to add an identifier for a single publication that should be cited when using the dataset. The database should hold references to many publications that are describing the dataset, but when citing many records it is often only possible to cite one publication. Ideally, the person who submits data to the DB should indicate the reference to the publication that should be cited whenever the dataset is used.

Table ‘Entity’: The variables ‘IceThickCM’ and ‘C14DepthAdj’ may be deleted and the data that is present could be stored in a free text ‘notes’ field in an appropriate table.

Table ‘Descr’: This table should be reviewed in detail. At the moment it does not contain a variable describing whether or not a lake has an in- or outflow. We initially proposed to add:

  • Higherdescr–Descriptor–Description
  • LNAT———-LNIN——-lake without inflow
  • LNAT———-LNOU——-lake without outflow
  • LNAT———-LNIO——-lake without in- and outflow
  • (LNAT = natural lake)

However, we realize that these will bring duplications as e.g. a lake of fluvial or glacial origin may have an inflow or not. Therefore it seems necessary to review the list of choices in the ‘descr’ Table.

Table ‘Litholgy’: The sediment description is currently free text and therefore can not be queried. It would be helpful to add a general variable where the choice has to be made between e.g. gyttia and peat. In this way sites that went from lake to mire could be identified.

Tables containing dating information like ‘Pb210’, ‘tl’ … may be combined into a single table containing a variable that identifies the type of age determination.

Table ‘Sitedesc’: We felt it important to have a variable that describes the general vegetation around the site in the way that the variable IGCP-type does. However, the suitability of the latter variable should be reviewed. This variable could possibly also be generated through a GIS query.

Secondly we reviewed the scorpion table structure designed by John Keltner. We acknowledged that most of the things that we identified as needing change in the paradox table structure were realized in the scorpion table structure. Additionally in scorpion all look up tables were combined into a glossary table. Furthermore, the scorpion table structure is better capable of holding more proxies than pollen and LOI that may have been analysed from the same core or at the same site but a different core.

In Europe we are in a situation where two macrofossil databases are being build up that will eventually be made publicly available. We therefore recommend that the EPD adopts a table structure that can comfortably accommodate several proxies and that the European Pollen Database shall be combined with the macrofossil databases in the same tables.

Although the scorpion table structure is a desirable step forward into the future, no import tools are currently available for adding new datasets to the scorpion database. Currently, Eric Grimm and co-workers are preparing a new database for multi proxy datasets. The new version of Tilia, which was presented as a β-version during the EPD meeting, will serve as an import portal for this new database and import tools will be developed. For these reasons, we recommend that the EPD continues working with the old paradox tables until a new complete solution is available.

Protocol to help the data manager’s coordination with support group

Data entry:

Metadata

a) Minimum b) Desired c) Maximum possible

When receiving a new dataset:

Each dataset receives a tracking number which is posted under appropriate categories (e.g. received, work in progress, ready to upload) on the webpage or wiki. This number will also be used for internal tracking e.g. when the dataset goes out to support groups.

Check if minimum and desired metadata is available. If a and b = TRUE go to enter data. If a = FALSE ask the submitter for more data. If b = FALSE look in publication and/or ask for more data. If no reply or reply with limited but minimum metadata present go to enter data. Any other problem contact regional work group.

Before entering the data:

A) Check if age-depth models are present. (uncal. and cal.) If A = False create simple age depth model or/and contact age depth group.

Add author of age-depth model to database.

Make taxon harmonisation. If problem occurs ask Taxonomy group.

Accuracy-control check: After age depth models are obtained and taxonomy is harmonized - produce percentage diagram and send to author for approval. (This will give the author (or the submitter) the possibility to compare the submitted dataset with the dataset that is going to be entered into the database.

If submitter/author approves data can be imported into the original database!

If within a month time new sites have been added to the original database, update the database at Medias. – A note is added to the ‘new sites’ webpage or the wiki. Also make a new downloadable version available (if possible also one in Access).

Correcting errors:

Through his work with the EPD and GPD John Keltner has located and corrected many mistakes in the metadata for which we are very grateful. However, many mistakes or omissions in important metadata remain and efforts should be made to correct them or add additional metadata.

Every error correction needs to be documented.

Ask Taxonomy group to fix existing errors.

Metadata mistakes:

1) If available check publication and correct.

If 1) = False contact Mapping and accuracy group where appropriate or Regional contact. If Regional group is contacted cc to Mapping group.

Mistakes in taxon counts or LOI:

These errors are more serious and if possible the data contributor or author should be contacted. In cases where this is not possible contact the Mapping and accuracy group and/or Regional contact.

Contacting working groups

In the first instance only the contact person is contacted – after one week the whole group is contacted. If no reply after two weeks send question to the wiki.

Data submission checklist

  • Mandatory data = no dataset will be entered if these are not present
  • Desired data = the metadata that should be submitted and the data manager and/or work groups will try to gather that data if necessary inquire from the author
  • Additional metadata = data that is useful to have in the database but no attempt will be made to complete it if it was missing from the original submission

Mandatory data for submission:

  1. Contact person name:
  2. Contact person e-mail:
  3. Contact person address:
  4. Site Name:
  5. Site description/Depositional context (choose from list):
  6. Country:
  7. Latitude (degrees):
  8. Longitude (degrees):
  9. Elevation (m a.s.l.):
  10. Pollen count data
  11. Depth of samples

Desired data for submission:

  1. Age determination of the sediment (e.g. radiocarbon, tephra, varve age)
  2. Water depth: (is 0 if no lake)
  3. Basin size (ha): (lake: water surface; mire: unforested area)
  4. Permanent superficial inflow (stream) (y/n):
  5. Permanent superficial outflow (stream) (y/n):
  6. Coring/Sampling device:
  7. Year core/sample collected:
  8. Publication author would want users to cite when using the data (Full reference with doi number):
  9. Sample volume (cm3) and/or Sample weight (g): a value for each sample
  10. Sample thickness (cm): a value for each sample
  11. Surrounding vegetation: (vegetation-type from list(2) e.g. IGCP-type)
  12. Surrounding vegetation (free text):
  13. Regional vegetation (free text):
  14. Lithological information: (gross categories e.g. gyttia, peat from list(2) - table)
  15. Digital picture of the site
  16. Age estimates of all samples as cal. yrs BP (possibly with Age uncertainty low and high)(1)
  17. Age estimates of all samples as uncal. 14C BP (possibly with Age uncertainty low and high)(1)
  18. Age basis for chronology(1)

(1)Note: If missing these fields will be added by the age-depth-model working group. (2)Note: The lists of choices are not yet in place

Additional metadata:

  • Hydrological catchment size (ha):
  • Latitude of coring location (degrees):
  • Longitude of coring location (degrees):
  • Additional publications (relating to the site) (Full reference with doi number):
  • Loss-on-ignition (% dry weight):
  • Description of the surrounding (regional and local) vegetation
database_structure.1220950880.txt.gz · Last modified: 2015/06/25 16:07 (external edit)
Back to top
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0