Controlled Vocabularies

What is a controlled vocabulary?

The DDI Controlled Vocabularies Group (CVG) has created a set of controlled vocabularies that can be used with DDI as well as for other purposes and applications. Select DDI Alliance vocabularies are already in use at organizations like the Finnish Social Science Data Archive (FSD), the GESIS - Leibniz Institute for the Social Sciences, the Inter-university Consortium for Political and Social Science (ICPSR), Mathematica Policy Research,  the UK Data Archive (UKDA), and the University at Bielefeld, Germany. Nesstar Publisher (http://www.nesstar.com/) now incorporates the controlled vocabularies for Analysis Unit and Time Method.

A paper on "Controlled Vocabularies for DDI 3: Enhancing Machine-Actionability" provides additional background on this effort.

The vocabularies are published in an XML format called Genericode, an OASIS specification. The Genericode format provides a tabular model for code lists. They are also made available in HTML and XLS (Excel) form.

Usage information for each controlled vocabulary is available in the vocabulary documentation. Usage instructions specific to DDI Lifecycle, as well as recommendations for citing the CVs outside DDI are also available, along with examples. The published DDI-CVs work with both versions of the DDI specification.

Please note that the production, publication, and maintenance of translations are currently outside CVG's scope, but CVG would appreciate being informed if other agencies are interested in, or planning to undertake such tasks. If translations are produced by other organizations, CVG can make them available for consultation, but will not assume responsibility for their content, or maintenance. Currently, the following translations are available for reference:

Name Title Languages File Type
AnalysisUnit Analysis Unit DEFI xls
TimeMethod Time Method DEFI xls

 

The CVG functions as the management team for the vocabularies. Comments, as well as suggestions for amendments or additions, are welcome from all users. To provide feedback, or submit proposals for changes, please contact the CVG. Please note that the production, publication, and maintenance of translations are currently outside CVG's scope, but CVG would appreciate being informed if other agencies are interested in, or planning to undertake such tasks.

The DDI CV versioning policy as described below has been approved by the DDI Alliance in November 2012 and is published and implemented starting February 2013. This new protocol supersedes the previous policy which was based on a three-digit version numbering system. Users who have referenced these vocabularies prior to February 1, 2013 will need to retroactively change any reference to V. 1.0.0 into V. 1.0. From that point on, new versions can be used and referenced normally.

The controlled vocabularies versioning policy is based on an intellectual, or logical, assessment of the nature of change, which distinguishes between substantive and non-substantive changes in the CVs, as described further below. To reflect this distinction, the version numbering system is based on a two-level structure (examples: 1.0, 1.1, 1.2, 2.0, etc.). A change in the integral part of the decimal number will indicate a substantive change in the controlled vocabulary. A change in the fractional part will indicate a non-substantive change. All version levels (i.e. the full decimal number, even when the fractional part is zero) will always be mentioned

Versioning of the CVs is done at the level of each published controlled vocabulary, and not at the item level.

An item in a CV list consists of the following parts:

Code The specific content that is entered into the DDI specification to identify the item. In hierarchical lists, all of the levels are always mentioned in each code, and are separated by a period.
Term The display label associated with the code. This may be available in multiple languages.
Definition The definition of the code. This may be available in multiple languages.

 

Changes in version will be made according to the following rules:

Substantive changes: any change in (list) content or (code) meaning

  • Addition of new code(s) (change in list content)
  • Deletion of existing code(s) (change in list content)
  • Widening the definition of a code (change in meaning)
  • Narrowing the definition of a code (change in meaning)
  • Change in the “value” or “name” of a code, including change in spelling; since the codes are the “official” or “legal” entries (“terms” and “definitions” are documentation for the codes) a change in name really amounts to a change in code, i.e. change in list content, therefore this is a substantial change)
  • Merging codes (amounts to deleting codes and adding new one(s))
  • Splitting codes (amounts to deleting codes and adding new ones)

Non-substantive changes: Changes in wording, spelling, etc. (i.e. “form”) that do not involve changes in content or meaning:

  • Rephrasing a definition to make it clearer, or adding examples without changing the meaning of the code
  • Rephrasing a “term” (the natural language “label” for the code) for clarity without changing the meaning of the code
  • Correcting spelling errors in both “term” and “definition”.

In addition to a change in the version number, each new version of a CV will contain documentation about how the new CV compares with the previous version. In the Genericode XML, the changes will be documented using the following notations:

UNCHANGED: X -- Code X and its definition have remained unchanged.

RENAMED: X-Y -- The definition for code X has remained the same but the code itself has been changed (renamed) to Y.

REDEFINED: X -- The definition for code X has been changed to reflect a change in meaning for code X.

DEFINITION REPHRASED: X -- The definition for code X has been rephrased for clarity or edited for accuracy without a change in meaning for code X.

TERM REPHRASED: X -- The term describing code X has been changed or edited for clarity or accuracy without a change in meaning for code X.

WIDENED: X -- The definition of code X has been changed to expand the meaning of the code.

NARROWED: X -- The definition for code X has been changed to reflect a narrowing in the meaning of the code.

MERGED: X, Y, (n)-Y -- The old code X has been removed and all the data classified with it are included in code Y.

SPLIT: X-X, Y,(n) -- The meaning of code X has been narrowed and code Y has been added to cover for the remainder of the meaning previously held by code X.

REMOVED: X -- Code X has been deleted.

ADDED: Y -- A new code Y has been added to the CV.

Note: DDI-CVG has also produced a set of guidelines to support controlled vocabularies users in retrofitting their collections following the publication of new CV versions. Please note that these are only intended as recommendations, and are not being enforced as part of the versioning policy.

Download
Download the complete package of files for all DDI Controlled Vocabularies: DDI-CV_2016-02-10.zip

The table below lists the CVs currently available and provides download links for each format.

Name Title Description File Type
V. 1.0
File Type
V. 1.1
File Type
V. 1.2
File Type
V. 2.0
  • html: rendering as web page
  • xml: Genericode (version 1.0, DDI-CV profile 1.0)
  • xls: Excel (version 2003)
AggregationMethod Aggregation Method

Identifies the type of aggregation used to combine related categories, usually within a common branch of a hierarchy, to provide information at a broader level than the level at which detailed observations are taken. (From: The OECD Glossary of Statistical Terms)

html, xml, xls      
AnalysisUnit Analysis Unit

Describes the entity being analyzed in the study or in the variable.

html, xml, xls      
CharacterSet Character Set

Standard set of characters upon which many character encodings are based (Wikipedia).

html, xml, xls      
CommonalityType Commonality Type

Describes the degree of similarity between two items or schemes (collections of items).

html, xml, xls      
DataType Data Type

Identifies the type of data, which has a bearing on the acceptable data values, the operations that can be performed with the data, and the ways in which the data are stored. The present list is based on the W3C data types, and includes the terms relevant for documenting research data.

html, xml, xls      
DateType Date Type

Specifies the type of date. The present list is based on ISO 8601 usage.

html, xml, xls      
LanguageProficiency Language Proficiency

Describes the level of proficiency of an individual in a natural language.

html, xml, xls      
LifecycleEventType Lifecycle Event Type

Specifies the event happening over the data life cycle that is considered significant enough to document.

html, xml, xls      
ModeOfCollection Mode of Collection

The procedure, technique, or mode of inquiry used to attain the data.

html, xml, xls html, xml, xls   html, xml, xls
NumericType Numeric Type

Specifies the type of numeric data.

html, xml, xls      
ResponseUnit Response Unit

Indicates the entity that provided the information carried by the variable.

html, xml, xls      
SoftwarePackage Software Package

Indicates the statistical software package used in the production/processing/dissemination of the data. Data collection software is not covered in this list.

html, xml, xls      
SummaryStatisticType Summary Statistic Type

Specifies the type of summary statistic. Summary statistics are a single number representation of the characteristics of a set of values.

html, xml, xls     html, xml, xls
TimeMethod Time Method

Describes the time dimension of the data collection.

html, xml, xls html, xml, xls html, xml, xls  
TimeZone Time Zone

Time zone specification as an offset from UTC (Coordinated Universal Time) in terms of hours and minutes.

html, xml, xls      
TypeOfAddress Type of Address

Identifies the type of address entered as contact information for an individual or an organization.

html, xml, xls      
TypeOfConceptGroup Type of Concept Group

Specifies the rationale for creating a concept group.

html, xml, xls      
TypeOfNote Type of Note

Includes a typology of notes.

html, xml, xls      
TypeOfTelephone Type of Telephone

Identifies the type of telephone entered as contact information for an individual or an organization.

html, xml, xls