[an error occurred while processing this directive] [an error occurred while processing this directive]

Voorburg Compromise Explanations

Definitions

nCube: An n dimensional array in which each coordinate or cell intersects at a single point with every dimension of the cube. A cell's logical position within an nCube can be uniquely defined by its coordinates.

Grid: A grid is a physical data set that can be defined by a range of columns and rows (for example a spreadsheet). Physical locations are given by column/row coordinates such as A3 or AC46.

Underlying Ideas Behind Changes to 3.0 and 4.0

Separation of logical structure description from physical structure descriptions: In v.10 the logical structure of the data (meaning of coded entries, questions from which variables were derived, intellectual groupings of variables) is described in 4.0. Physical description is split between 3.0 (physical description of the overall data file) and 4.0 (location information for microdata variables). The difficulty we had with the aggregate data is that on a purely logical level the description of the microdata VAR is synonymous with the description of a dimension of an nCube. However, a VAR in a microdata file has a unique location (for example SEX StartPos = 3 EndPos = 3). With aggregate data VAR now represents a range of values within one or more nCubes and location can no longer be uniquely defined.

Logical structure: Logical structure of an nCube can be described independently of a physical storage structure. It may, in fact, not exist as a physically stored set of data. A physical structure such as a 2 dimensional tabular layout of data may contain multiple logical nCubes. These are often 'hinged' along a common dimension such as geography, age or year (among others). The description provides the logical relationship between the cells of the nCube. This includes the coordinates of the cell within the nCube, the relationship of the cell to other cells along the same dimension, and how the cell contents were derived.

Physical structure: Physical structures were defined for STORED data not how data is displayed. This difference is important when working with spreadsheet or database systems where the display can be altered without affecting the underlying data structure. Use of these physical storage structures is not limited to aggregate data.

Display: The intent of the DTD is to provide the user with the information needed to create the display of their choice NOT to provide a reproducible image of a given display. The DTD provides the titles, headers, stubs, and cell contents plus any additional metadata required to interpret that information.

Linking grids: Some grids are stored on multiple 'sheets' due to their size and often repeat titles, column headers and/or row stubs on each sheet. The grids on each sheet can be linked across (ribbons) or down (strips) by chopping off a number of rows or columns to avoid mid-grid repetition of headers and stubs when creating a mastergrid.

Describing the logical dataset: The DTD should be seen as a means of describing a logical data set that may be represented by one or more copies or editions. Aggregate data sets may exist as logical structures without having a physical aggregate data file associated with it. The physical file or display is created for an individual query from the microdata. One logical description can have multiple physical files of one or more structural types associated with it. This allows for the creation of a single xml instance referencing multiple data locations. For example, a logical VAR could be stored in a ascii fixed format file, a dBase file and an Oracle file. All three physical locations would be provided and the data could be obtained from any of the three sources from the search of a single logical definition. A location map is used to link the logical definition of the data item to one or many physical locations.

Changes to Sections 3.0 and 4.0 (no invalidating changes were made) [elements borrowed from other sections with no change in meaning are not noted]

Section 3.0

Section name Description of change

3.1 fileTxt

make optional repeatable to allow for muliple physical files to be described

3.1.3 fileStrc

add attribute "type" { rect | hier | rel | grid } to indicate type of file structure and "link" { Y | N } to indicate if this file is linked to another file through a linking variable

3.1.3.2 grdGrp

similar to record group except it is a group of grids

3.1.3.2.2 blSht

basic layer sheet; functional equivilent of the recDimnsn with attributes rangeBeg { upper left hand corner of grid range } and rangeEnd { lower right hand corner of grid range }

3.1.3.2.2.1 cellQnty

number of cells in the sheet

3.1.3.2.2.2 colQnty

number of columns in sheet

3.1.3.2.2.3 rowQnty

number of rows in sheet

3.1.4.6 grdPrCas

number of grids per case (ex. if a state is a case it may have a number of grids associated with it - - population, housing units, income, etc.); similar to recPrCas

3.1.4.7 grdNumTot

total number of grids in the file; similar to recNumTot

3.2 dmnsFileset

dimensions of the file set with attribute fileRefs { IDRefs to 3.1.3 fileStrc }; allows for multiple files which cover different aspects of the logical data structure and which are normally linked through a key variable(s)

3.2.1 fileQnty

number of files in set

3.2.2 dataItmQnty

number of data items [data items can be microdata variables or nCube cells]

3.2.3 recGrpQnty

number of record groups in set

3.2.4 recLink

defines how records are linked; adds attribute recGrpRefs { IDRefs to specifc recGrp ID's }

3.2.4.1 rlkProc

record linking process; repeat for each set of links

3.2.4.1.2 fromRec

uses IDRefs to identify keyvar(s) used to create unique key in the 'from' record

3.2.4.1.3 toRec

uses IDRefs to identify keyvar(s) used to create unique key in the 'to' record

3.2.5 grdGrpQnty

number of grids in the file set

3.2.6 grdNumTot

total number of grids in set

3.2.7 grdLink

explains how to link grid sheets to create a larger cojoined grid; uses attributes blshtRefs, grdLinkRefs to identify ID of parts and ribbon { Y | N } and strip { Y | N } to indicate if they should be linked horizontally or vertically (to do both create all ribbons and then put together as strip)

3.2.7.1 glkProc

used to define what should be chopped off to create single grid

3.2.7.1.2 colChop

columns to chop off for making ribbons one element per sheet; attributes order {order of appending} and chopRng {columns to chop off}

3.2.7.1.2 rowChop

rows to chop off for making strips one element per sheet; attributes order {order of appending} and chopRng {rows to chop off}

3.3 locMap

used to map logical definitions for data items to physical storage description; refer to a specific logical data item using either attribute varRef OR nCubeRef {both are IDRef}

3.3.1.1 CubeCoord

use to identify the logical coordinate location within a cube; use one for each dimension of the cube giving coordNo and coordVal or coordValRef {an IDRef to the var where the coordVal is obtained }

3.3.1.2 physLoc

gives fileRef, recRef, blsRef, startPos, endPos, width, col, row, query, colCnv rowCnv { last two used when col or row information is derived from the value of a specific coordnate number } [dependent upon file type] [note: query should allow for indicating query type such as SQL]; this field can be repeated for each physical instance of a data set with the same logical data structure

Section 4.0

Section name Description of change

4.1 varGrp

add attribute nCube { IDRefs } to allow varGrp to be used for identifying conceptually related nCubes; this should not be used to define nCubes with common dimensions because this can be done through the var description. Allows for the linking of contextually related items like an nCube on Norwegian Fishermen with an nCube on Norwegian Fishing Vessels

4.2 var

[var is now also used to describe nCube dimensions, its use is consistant with mtxDmns descriptions in Wendy's unnested model.]

add attributes:
aggrMeth { e.g. "sum", "average", "count" }
measUnit { e.g. "km" "NGL" }
scale { unit of scale e.g. "x 1000" }
origin { what is the origin point }
nature { nominal, ordinal, interval, ratio }
additivity { stock, flow, non-additive }

4.2.1 location

add attribute locMap { IDRef to the locMap (3.3) and indicating that the location information is there not here. } [note: var definitions that are used to describe dimensions should not use the location field. Location information link will be in nCube hierarchy. Ideally all location information will move to 3.3 so that multiple locations for the same data item can be listed within a single xml instance. This is simply a reroute for programmers in order to retain validity with version 1.0.]

4.2.18 catgry

add attributes other { Y | N } and total { Y | N } indicates if the sum of all other catagories results in the contents of this category

4.2.18.5 catgry

allows recursive nesting of catagories

4.3 nCube

formerly known as Cube or varMtx it has been reborn as nCube under the VC ;-)

new attributes:
dmnsQnty { number of dimensions in the nCube }
cellQnty { number of cells in the nCube }

4.3.1 location

matches 4.2.1 to avoid problem of renaming. Within the nCube ONLY the attribute of locMap should be used.

4.3.11 purpose

allows for PCDATA entry or URI link that explains the purpose for creating this specific aggregation. For example, a crosstabulation of specific age cohorts by specific percentages of poverty by race designed to meet the reporting requirements of a federal funding program.

4.3.12 timeDmns

special type of dmns [see 4.3.13 for details]

4.3.13 dmns

listed n times with the nCube, one for each dimension of the cube (AGE x SEX x RACE = 3 dimensions). Attributes: varRef { IDRef to var used to define dimension } and rank { indicates order of dimension providing cell coordinates. } For example:

<dmns varRef = "AGE" rank = "1"/>
<dmns varRef = "SEX" rank = "2" />
<dmns varRef = "RACE" rank = "3" />

a cell with the coordinates of 3 2 3 would indicate the 3rd category in AGE, the 2nd category in SEX and the 3rd category in RACE.

4.3.13.1 cohort

allows for selection of specific catagories within the dimension for inclusion using the catRef {IDRef } and value { to define a value to use in cell coordinate notation if needed } Use of cohort element implies only a subset of catagories will be used.

4.3.13.1.1 range

allows for the selection of a range of catagories within the selected dimension. Nested in cohort as a specific type of category subset. [see 4.2.10.1 for range element description]

4.3.14 measure

attributes of measurement pertaining to the nCube as a whole [see var for attribute definitions]

[an error occurred while processing this directive] [an error occurred while processing this directive]