Glossary

This glossary contains terms specific to DDI and metadata. For a broader dictionary of terms related to research data management, see the CASRAI glossary for Research Data Domain terms. 

Click one of the letters above to advance the page to terms beginning with that letter.

A

API

An API (or Application Programming Interface) is a language and message format used by an application program to communicate with the operating system or some other control program such as a database management system (DBMS) or communications protocol. APIs are implemented by writing function calls in the program, which provide the linkage to the required subroutine for execution.

Archive

Archives consist of records that have been selected for permanent or long-term preservation on grounds of their enduring cultural, historical, or evidentiary value. As a verb, archive means to store records for the long term.

C

Catalogue

Catalogues contain sets of metadata entities, identifiers, and descriptions of associated items included in a registry. Registries can be thought of as smart catalogues with enhanced functionality which allow for the classification of objects.

Codebook

A document that provides information on the structure, contents, and layout of a data file.

Community

The term community is used to identify any grouping of personal or organizational entities, at different levels of formal organization, that are considering or undertaking implementation of DDI. Examples: a national statistical service, a data producer, an archive, a consortium of data archives.

Controlled vocabulary (CV)

Broadly speaking, a CV can range from a short list of clearly defined, mutually exclusive, and exhaustive terms, which are the only choices for usage in a specific context (e.g., populating certain DDI elements or attributes) through a classification to something as complex as a thesaurus with thousands of terms and term relationships. A CV has also been described as "A set of subject terms, and rules for their use in assigning terms to materials for indexing and retrieval." (http://www.cs.cornell.edu/wya/diglib/MS1999/Glossary.html). In a CV, a term consists of one or more words used to represent a concept (example: “fear”; “females”; “child care”). Terms are selected from natural language for inclusion in a controlled vocabulary.

Crosswalk

A structured model of how one list of items maps into a related list of items.

Curation

From the Latin “to care”

D

Data Documentation Initiative

The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.

Data life cycle

The stages of development of the data component of the research process, from study conceptualization to data analysis and archiving, feeding back to earlier stages.

DDI application

A software application that reads and/or writes DDI XML.

DDI Core

A simple version of DDI Lifecycle based on DDI Lite (see below) with basic information describing a dataset.

DDI instance

A DDI Instance is the top-level wrapper for any DDI document. It may contain a set of top-level elements, which generally correspond to the modular breakdown within DDI. Every DDI Instance will use this wrapper, regardless of its content.

DDI Lite

A simple version of DDI Codebook with basic information describing a dataset.

DDI profile

A mechanism to describe an organization’s selected subset of elements and attributes.

DDI scheme

Schemes are maintainable lists of metadata elements that may be published separately and reused by a number of studies. Schemes are the basis for resources such as question banks, concept banks, and variable banks. The construction of schemes takes into consideration their potential reuse by others. DDI Lifecycle uses the scheme approach.

Discovery

Strategies and processes used by the end user to locate and access products (metadata, data, and other related information) of the data life cycle.

Dissemination

Data distribution with the aim of access by the end user to the products (metadata, data, and other related information) of the data life cycle.

DNS

The Domain Name System (DNS) translates Internet domain and host names to IP addresses. It translates domain names meaningful to humans into the numerical (binary) identifiers associated with networking equipment for the purpose of locating and addressing these devices world-wide. An often used analogy to explain the Domain Name System is that it serves as the “phone book” for the Internet by translating human-friendly computer hostnames into IP addresses. For example, www.example.com translates to 208.77.188.166.

DTD

Document Type Definition is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. A DTD is a formal expression (in XML) of the structural constraints for a class of XML documents. The DTD language constructs are element and attribute-list declarations. The first versions of DDI Codebook (those released before DDI Codebook 2.5) used a DTD rather than XML schema.

Dublin Core

An element set that functions as core metadata for simple and generic resource descriptions.

E

End user

Person performing work in the data life cycle for whom DDI metadata is required. The end user will likely not even be aware of the DDI metadata in the application he or she is using. End users span the data life cycle. Examples include research councils/funding bodies, researchers, data producers, archivists, librarians, data analysts, registry managers, research analysts/authors.

External publication of DDI schemes

This refers to the publication of DDI schemes as resources packages for use by the broader community.

F

Federated search

Federated search is the simultaneous search of multiple online databases or Web resources and is an emerging feature of automated, Web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.

G

Genericode

Genericode defines a standard format for defining code lists (also known as enumerations or controlled vocabularies). Genericode aims to provide the following:

  • A standard model and XML representation for the contents of a code list
  • A standard model and XML representation for data associated with items in a code list
  • A standard model and XML representation for how new code lists are derived from existing code lists

The DDI Controlled Vocabularies are available in Genericode format.

GNU-LGPL

The GNU Lesser General Public License (formerly the GNU Library General Public License) is a free software license published by the Free Software Foundation.

Governance

The term governance is used here to refer to the procedures associated with the decision-making, control, and administration of DDI metadata sets.

I

Identifiable (in the context of DDI)

“Identifiables” are those elements in DDI that carry only the basic level of identification: a URN, ID, and Name.

Inclusion inline vs. by reference

Material is considered included inline when the content is explicitly included. Inclusion by reference means that the material is referenced by one document but published elsewhere.

Ingest

In OAIS terminology, the OAIS entity that contains the services and functions that accept Submission Information Packages from Producers, prepares Archival Information Packages for storage, and ensures that Archival Information Packages and their supporting Descriptive Information become established within the OAIS. Used in its verb form, ingest refers to the process of taking information into a repository.

Internal publication of DDI schemes

This refers to publication of DDI schemes as resource packages within a specified project, working group, or organization. Note that a specific project may involve more than one organization, e.g., the Eurobarometer project.

Internationalization

Internationalization is the process of planning and implementing products and services so that they can easily be adapted to specific local languages and cultures, a process called localization.

Interoperability

This refers to the ability of making systems and organizations work together (inter-operate). Syntactic and semantic interoperability are distinguished as separate types.

IP

An Internet Protocol (IP) address is a numerical identification (logical address) that is assigned to devices participating in a computer network utilizing the Internet Protocol for communication between its nodes. Although IP addresses are stored as binary numbers, they are usually displayed in human-readable notations, such as 208.77.188.166 (for IPv4).

J

Java

Java is a programming language expressly designed for use in the distributed environment of the Internet. It was designed to have the "look and feel" of the C++ language, but it is simpler to use than C++ and enforces an object-oriented programming model.

L

Linked Open Data

Linked Open Data (LOD) refers to using the Web to connect related data that previously were not linked in order to discover new knowledge.

Logical record

A reference to a data record that is independent of its physical location. It may be physically stored in two or more locations.

M

Machine-actionable

This term refers to information that is structured in a consistent way so that machines, or computers, can be programmed against the structure. DDI provides machine-actionable metadata.

Maintainable (in the context of DDI)

“Maintainables” are complex objects that can be maintained outside of a DDI Instance (published as separate entities). Their identification strings ensure that they are globally unique.

Maintenance agencies

These organizations own the metadata objects they maintain, and only they are allowed to make changes to those objects.

Major version

The definition of a major version varies according to what is being published. However, major versions are expressed by the digits to the left of the decimal point.

Metadata

The main purpose of metadata is to facilitate in the discovery of relevant information, more often classified as resource discovery. Metadata also help organize electronic resources, provide digital identification, and help support archiving and preservation of the resource. Metadata assist in resource discovery by “allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information.” Metadata are often defined as “data about data.”

METS

Metadata Encoding and Transmission Standard

Migration

Migration in the DDI context refers to moving from a DTD to XML Schema in terms of document structure; and from DDI 2 to DDI 3 as well as from DDI 3 back to DDI 2 in terms of porting content.

Minor version

The definition and level of detail of a minor version varies according to what is being published. The minor version information is always located to the right of the first decimal and can be further subdivided at the discretion of the maintaining agency.

N

NCubes

NCubes describe the logical structure of an n-dimensional array, in which each coordinate intersects with every other dimension at a single point. The NCube has been designed for use in the markup of aggregate data.

O

Open Archival Information System (OAIS)

A reference model of the space community that governs general archival activities and policies -- http://public.ccsds.org/publications/archive/650x0b1.pdf. Includes:

  • SIP: Submission Information Package
  • AIP: Archival Information Package
  • DIP: Dissemination Information Package

P

Physical record

The physical instantiation of a logical data record, e.g., as part of a data file.

Pre-coordinated/Post-coordinated controlled vocabularies (CVs)

In pre-coordinated CV systems, multiple concepts are brought together in one term. An illustrative example is the Library of Congress Subject Headings (LCSH), which yield entries such as: "Insurance, Unemployment --Switzerland --Statistics." This method allows for disambiguation of the relationship of the concepts in the term that might not be possible in post-coordinated systems, such as whether a term is a qualifier of another. In post-coordinated or faceted systems, concepts are kept broad and separate and selected and joined in the process of searching with Boolean operators. A representation of the above LCSH in this system could be "Insurance AND Statistics AND Switzerland AND Unemployment" – note that entry order in the query has no relevance here. An example of such a system is the American Psychological Association's Thesaurus of Psychological Index Terms.

PREMIS

Preservation Metadata Implementation Strategies
PREMIS Working Group

Published metadata

Published metadata is considered available for use outside of the community that created the original document. This broader audience may be internal to a project or organization or external. Metadata that is published must be wrapped in a DDI instance, versioned, and available for reuse or reference from outside of the instance. Packaging as a DDI instance does not necessarily mean packaging for publication. Metadata may be packaged for reasons other than publication during its internal development process. In these cases versioning is not required.

R

RDF

The Resource Description Framework (RDF) is a general framework for how to describe any Internet resource. DDI has developed RDF vocabularies for specific purposes.

Register

A collection of elements and attributes that contain information on a particular subject whose authors wish to share with others. Registers require support of well-defined registration processes, and include provisions for dealing with provenance and auditing, versioning, and security enforcement. Registers are the basic components of registries.

Registry

A virtual, centralized and structured database or portal that allows one to list, do a structured search, and to identify and retrieve metadata and possibly data that are distributed around a network. Registries are places where various types of resources are indexed and made visible and available for use throughout a community; they do not include clustered servers or depend on harvesting approaches to access their contents. Some survey organizations register measures (question wording and response options, for example) in order to standardize the way they elicit information from respondents. The implication is that there is one correct way for an organization to measure, say, income. Examples of DDI registries could be question banks, concept banks, social science data survey catalogs, and variable banks.

Repository

A data repository is a central place where data are stored and maintained.

Resource package

A resource package is a means of packaging any maintainable set of DDI metadata for referencing as part of a study unit or group. A resource package structures materials for publication that are intended to be reused by multiple studies, projects, or communities of users. A resource package uses the group module with an alternative top-level element called Resource Package that is used to describe maintainable modules or schemes that may be used by multiple study units outside of a group structure.

U

Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems.

URL

A URL (Uniform Resource Locator, previously Universal Resource Locator) is the unique address for a file that is accessible on the Internet. A common way to get to a Web site is to enter the URL of its home page file in a Web browser's address line. However, any file within that Web site can also be specified with a URL.

URN

A URN (Uniform Resource Name) is an Internet resource with a name that, unlike a URL, has persistent significance -- that is, the owner of the URN can expect that someone else (or a program) will always be able to find the resource.

V

Versionable (in the context of DDI)

“Versionables” comprise a subset of DDI “identifiable” elements. These are elements for which changes in content are important to note and thus additional attributes related to versioning are enabled.

Versioning

The process of providing a unique identifier for an element or entity that changes over time. Versioned elements retain their original ID but their version number is incremented to reflect a difference in content. This allows a reference to persist through the ID while allowing for either the specified version or the most current version of the element to be obtained. What is versioned, maintained, and referenced in DDI Lifecycle is the metadata itself, rather than the XML which expresses that metadata. While this might seem like a minor distinction, it has major implications for how applications are developed.

W

Weighting

The use of sampling procedures may make it necessary to apply weights to produce accurate statistical results. Describe here the criteria for using weights in analysis of a collection. If a weighting formula or coefficient was developed, provide this formula, define its elements, and indicate how the formula is applied to data.

X

XML editing software, or XML editors

Applications that facilitate the creation of XML documents by providing prompts regarding the appropriate use of tags based on the XML schema which can be pre-loaded into the software. XML editors also validate XML documents and assist in producing valid documents by pointing to existing errors and usually indicating how the errors might be corrected. Examples of commercial XML editors are XMLSpy, oXygen, XMetaL. Free editors are also available. For a more complete discussion, see http://www.ahds.ac.uk/creating/information-papers/xml-editors/#section2

XML Schema

The XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. XML Schema is a W3C Recommendation.