Frequently Asked Questions (FAQ)

  1. What is the DDI?

    The DDI is a specification for describing social science data in XML. Essentially, the DDI is a way of formatting the documentation for a social science data file, such that it is much more useful than a simple MS Word or text file. The tagged structure enables computer processing of the information.

    The DDI is currently a specification, not a standard. While the DDI has not yet become a formal ISO standard, that is an important goal of the initiative.

  2. What is XML?

    XML stands for Extensible Markup Language. You can find a lot more information on XML in general on the DDI site, but to answer the question simply, XML is a way of tagging text for meaning, instead of appearance.

    If you wanted to emphasize the name of a variable in a codebook, you'd probably make it bold or type it in all capitals, but you would also use that same emphasis for the study title, or for important notes on using the data. As such, it becomes impossible to easily identify variable names, which would make your codebook much more useful. In XML, you surround your information with tags that differentiate information:

    <study-title>Euro-barometer</study-title>
    
    <summary>This study was conducted in <time-period>April 1994</time-period>
    and covered....</summary>
    

    With a good search engine, it would thus be possible to search for studies done in April 1994, and this study would be returned as a result.

  3. What is a DTD? What is an XML Schema?

    Tagging itself isn't useful unless certain rules are established. For example, if one investigator uses the tag <title> to denote a study title, while another uses <study-title> or <studyTitle>, then immediately the XML becomes less useful because there's no easy way for a computer application to determine where the title can be located.

    To prevent this situation, we make use of a Document Type Definition (DTD) or an XML Schema. Simply put, these documents spell out which tags are available, the order in which they should appear in a document, and whether those tags are required or optional, and repeatable or not. The DDI project is about creating a standard format that users can employ to mark up their codebooks in a meaningful and consistent fashion.

    To clarify the distinction between a DTD and a Schema, a DTD is written in a special syntax, whereas Schemas are written in XML itself.

  4. What do you mean by 'validation'?

    After producing an XML document, you should always validate the document. Basically, this involves running a small program that checks the XML against its corresponding DTD/Schema to make sure that the document was built corrently. The validator will generally tell you if you forgot a tag, misspelled a tag, or made any other kind of syntax error.

  5. What is XSL? XSLT?

    The "problem" associated with XML is that it's really a non-visual format. An XML document on its own is not easily read. To the average user, it looks like a bunch of gibberish text. Remember that XML is all about content, not display.

    XSL/XSLT, which stands for eXtensible Stylesheet Language, is the complement to XML that controls display. In the header of your XML document you specify the appropriate DTD/Schema, and you may also specify a stylesheet. The stylesheet will control display, explaining what elements display, in what order, and also basic character/paragraph formatting such as font face, size, color, etc.

    One important thing to understand is that XSLT is all about transformation. XML itself doesn't have a visual display; the stylesheets transform your XML into a format that does have a visual display, such as a Web page, PDF file, or RTF file.

    It's also important to note that XSLT isn't a built-in capability of most Web browsers. Unless your computer's operating system is quite up-to-date, you may not be able to view XML files, except as raw XML (i.e., you won't get the attractive display specified by the XSLT document). At ICPSR, we rely upon a server-side operation using Cocoon to transform our XML.

  6. What are the benefits of using the DDI?

    Benefits of the DDI approach are discussed in the About the Specification section of this Web site.

  7. How can I get started using the DDI?

    If you're interesting in using the DDI, you should look at the section of this Web site on Getting Started with DDI.

  8. What is the DDI Alliance? How can I participate?

    Information on the organization itself can be found in the About the DDI Alliance section of this Web site.

  9. I am starting a new project. How do I know whether I should use DDI 1/2 or DDI 3 as the basis for my markup?

    DDI Version 3.0 has a number of new features that were introduced to answer specific user needs:

    -It makes it possible, and convenient, to describe groups of studies that are related along one or several dimensions (time, geography, etc.). It also provides for documenting comparable items (concepts, questions, variables) among members of a group. This feature is particularly useful in marking up time-series, or multi-national studies, as well as documenting harmonized data.

    -It provides more complete documentation for complex data files (i.e. hierarchical data, with related records).

    -It also includes a new section designed for marking up questionnaires, with instrument characteristics, question text, conditions, and question flow (skip patterns). Questions can be linked to variables, but they are no longer children of a variable, as they were in Versions 1/2.

    -It enables documenting aggregate data both in a (comma) delimited format and a spreadsheet-type format, where locations are expressed as column/row.

    -It also provides for marking up and transporting the actual data - either aggregate, or microdata - in an XML format.

    -It offers significantly enhanced documentation for geographic coverage, with a description of all levels of geography that includes a mandatory specification of the highest and lowest levels, links to the geographic variables, etc.

    -It provides better documentation for translations, including coverage of multi-lingual studies.

    If you need to use any of the above-mentioned features in your DDI project(s), then the obvious answer would be to start using Version 3.0. If, on the other hand, you are only working with simple, microdata, survey-type studies, or if you are only producing study descriptions (catalog records), Versions 1/2 may be used, particularly if they appear to answer all your current and foreseeable needs.

  10. I have used DDI 2.0 for markup at my organization. Do I need to migrate to 3.0?

    See the answer above. If you do decide to migrate to 3.0, you may want to make use of the DDI 2 to 3 Converter Tool created by the Open Data Foundation.