Selection and Appraisal
Important questions arise as soon as data are created in the course of social science research.
- Which datasets are worthy of long-term retention? Who decides? What criteria are used?
- How can repositories work effectively with data producers to get all of the information they need for archiving?
Sound data curation practices begin with good answers to these questions. ICPSR is committed to establishing and implementing best practices in these areas.
Community Standards and Practice
Collection Development Policies
Selection of digital content for preservation is an important issue for repositories. Libraries generally publish collection development policies to inform their users about their goals in shaping their collections. More recently, the social science data archives have begun to make their policies available as well. Transparency on this issue is important because it helps depositors know where to submit their data and it helps archives avoid duplication of effort.
Some examples are provided for review:
- ICPSR Collection Development Policy
- Council of European Social Science Data Archives (CESSDA)
Collection Development Policy
- Northwestern Library
Social Science Data Services
Collection Development Policy

The Producer-Archive Interface
Sound data curation depends on a establishing a clear and effective relationship between the data producer and the archive where the data is deposited, preserved, and from where it is disseminated. Data curators have found it useful to conceptualize this relationship as involving four main phases. This is spelled out in the OAIS Producer-Archive Interface model.
Preliminary Phase → Formal Definition Phase → Transfer Phase ↔ Validation Phase
The preliminary phase establishes the foundation of the producer-archive relationship. This foundation is strongest when the preliminary phase begins before the data is collected. The aims in this phase are to define the information to be archived and to establish a preliminary definition of the data that will eventually be transmitted. If appropriate, this work can lead to a written preliminary agreement.
The formal definition phase specifies the data to delivered
and addresses the schedule and contractual and legal issues.
In this phase, producer and archive draw up a
Submission
Information Package
(SIP) which is a comprehensive description of data and metadata
from the project.
The transfer phase consists of the actual deposit of the data in the archive. It can be described as the execution of the plans made in the preliminary and formal definition phases.
In the validation phase, producer and archive deal with any anomalies in the data and assure that any problems encountered in the transfer phase have been resolved. This phase is complete when the producer and archive verify that the plans detailed in the Submission Information Packet have been executed.
For more detail about this process, see the NASA document
Producer-Archive Methodology Abstract Standard
.
ICPSR's Approach to Selection and Appraisal
ICPSR seeks data that are important to researchers, students, policymakers, journalists, and others with a professional interest in the social sciences. Data are deemed valuable for retention when they meet at least one of these general criteria:
- They have substantive current value for research and instruction.
- They have enduring value.
- They are unique in some way.
- They are useful for the development of emerging research and statistical techniques.
Priorities for Acquisition
ICPSR is especially interested in data in five areas:
Diversity Data. Data that fosters understanding of the experiences of racial and ethnic minorities and other marginalized peoples living in the United States.
Complex Data. Data arising from longitudinal research, survey research, and non-standard types: biological data, administrative records, video data, spatial data, remotely sensed data, and relational databases.
Mixed Method Data. Data that can support both qualitative and quantitative analyses; data resulting from concurrent (both at the same time), sequential (one following the other), or conversion (one method to the other) mixed method study designs.
Interdisciplinary Data. Data from interdisciplinary studies, and data resulting from studies using the research methods of multiple disciplines.
International Data. Data originating outside the United States and data that support crossnational, comparative research. We are especially interested in data from countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data.
Datasets that meet these selection criteria are further reviewed by ICPSR staff. Datasets are accorded a high priority for inclusion in the archive when:
- The data are not available anywhere else, or are not likely to be available elsewhere in the future.
- The data are in the public domain.
- Copyright is clear.
- Copyright owners agree to ICPSR's dissemination policies.
- The dataset adheres to standards for privacy and confidentiality.
- The technical documentation is complete.
- The data are in a format that facilitates ease of use.
ICPSR is also especially interested in acquiring data that have never been archived and are thus in danger of being lost. Examples of at-risk data include opinion polls, voting records, large-scale surveys on family growth and income, and many other social science studies.
ICPSR is the lead organization in the
Data Preservation
Alliance for the Social Sciences
(Data-PASS), a
partnership to identify and archive at-risk data.
ICPSR Data Sources
ICPSR receives and archives data from many sources.
Depositors. Much data comes to ICPSR from the researchers who conducted the studies. They are seeking a data archive that can make their data available to others and preserve it for future scholars.
Funding agency mandates. Many grants require that studies they fund be deposited in a public archive. Most of the data in the Thematic Collections are deposited with ICPSR under terms of the contracts and grants that fund the archives that support these collections.
Replication datasets. Social science practices increasingly require that investigators deposit datasets that include all data and information necessary to permit another researcher to replicate a corresponding published article, book, or dissertation. The ICPSR Publication-Related Archive houses many of these replication datasets.
Expert recommendations. Many studies are suggested by ICPSR Council members and Official Representatives from member institutions, senior faculty, and ICPSR staff. The staff attends scientific sessions at professional meetings, and constantly monitors federal grant databases at NIH and NSF, grants made by private foundations, listservs and newsletters, and scholarly publications in order to identify data of interest.
Series collections. The most recent updates of a large number of serial data collections are automatically added to the archive.
New data combinations. Harmonization of data files can result in new merged files for research. ICPSR is performing such harmonization for the Integrated Fertility Survey Series
. In addition,
ICPSR provides access to the merged Collaborative Psychiatric
Epidemiology Surveys
.Digitization and data entry. Many of the historical collections at ICPSR, including election returns, historical census data, and congressional roll calls, were digitized by ICPSR staff to create quantitative data files.
More about appraisal criteria.
