nCube: An n dimensional array in which each coordinate or cell intersects at a single point with every dimension of the cube. A cell's logical position within an nCube can be uniquely defined by its coordinates.
Grid: A grid is a physical data set that can be defined by a range of columns and rows (for example a spreadsheet). Physical locations are given by column/row coordinates such as A3 or AC46.
Separation of logical structure description from physical structure descriptions: In v.10 the logical structure of the data (meaning of coded entries, questions from which variables were derived, intellectual groupings of variables) is described in 4.0. Physical description is split between 3.0 (physical description of the overall data file) and 4.0 (location information for microdata variables). The difficulty we had with the aggregate data is that on a purely logical level the description of the microdata VAR is synonymous with the description of a dimension of an nCube. However, a VAR in a microdata file has a unique location (for example SEX StartPos = 3 EndPos = 3). With aggregate data VAR now represents a range of values within one or more nCubes and location can no longer be uniquely defined.
Logical structure: Logical structure of an nCube can be described independently of a physical storage structure. It may, in fact, not exist as a physically stored set of data. A physical structure such as a 2 dimensional tabular layout of data may contain multiple logical nCubes. These are often 'hinged' along a common dimension such as geography, age or year (among others). The description provides the logical relationship between the cells of the nCube. This includes the coordinates of the cell within the nCube, the relationship of the cell to other cells along the same dimension, and how the cell contents were derived.
Physical structure: Physical structures were defined for STORED data not how data is displayed. This difference is important when working with spreadsheet or database systems where the display can be altered without affecting the underlying data structure. Use of these physical storage structures is not limited to aggregate data.
Display: The intent of the DTD is to provide the user with the information needed to create the display of their choice NOT to provide a reproducible image of a given display. The DTD provides the titles, headers, stubs, and cell contents plus any additional metadata required to interpret that information.
Linking grids: Some grids are stored on multiple 'sheets' due to their size and often repeat titles, column headers and/or row stubs on each sheet. The grids on each sheet can be linked across (ribbons) or down (strips) by chopping off a number of rows or columns to avoid mid-grid repetition of headers and stubs when creating a mastergrid.
Describing the logical dataset: The DTD should be seen as a means of describing a logical data set that may be represented by one or more copies or editions. Aggregate data sets may exist as logical structures without having a physical aggregate data file associated with it. The physical file or display is created for an individual query from the microdata. One logical description can have multiple physical files of one or more structural types associated with it. This allows for the creation of a single xml instance referencing multiple data locations. For example, a logical VAR could be stored in a ascii fixed format file, a dBase file and an Oracle file. All three physical locations would be provided and the data could be obtained from any of the three sources from the search of a single logical definition. A location map is used to link the logical definition of the data item to one or many physical locations.
Changes to Sections 3.0 and 4.0 (no invalidating changes were made) [elements borrowed from other sections with no change in meaning are not noted]
| Section name | Description of change |
|---|---|
3.1 fileTxt | make optional repeatable to allow for muliple physical files to be described |
3.1.3 fileStrc | add attribute "type" { rect | hier | rel | grid } to indicate type of file structure and "link" { Y | N } to indicate if this file is linked to another file through a linking variable |
3.1.3.2 grdGrp | similar to record group except it is a group of grids |
3.1.3.2.2 blSht | basic layer sheet; functional equivilent of the recDimnsn with attributes rangeBeg { upper left hand corner of grid range } and rangeEnd { lower right hand corner of grid range } |
3.1.3.2.2.1 cellQnty | number of cells in the sheet |
3.1.3.2.2.2 colQnty | number of columns in sheet |
3.1.3.2.2.3 rowQnty | number of rows in sheet |
3.1.4.6 grdPrCas | number of grids per case (ex. if a state is a case it may have a number of grids associated with it - - population, housing units, income, etc.); similar to recPrCas |
3.1.4.7 grdNumTot | total number of grids in the file; similar to recNumTot |
3.2 dmnsFileset | dimensions of the file set with attribute fileRefs { IDRefs to 3.1.3 fileStrc }; allows for multiple files which cover different aspects of the logical data structure and which are normally linked through a key variable(s) |
3.2.1 fileQnty | number of files in set |
3.2.2 dataItmQnty | number of data items [data items can be microdata variables or nCube cells] |
3.2.3 recGrpQnty | number of record groups in set |
3.2.4 recLink | defines how records are linked; adds attribute recGrpRefs { IDRefs to specifc recGrp ID's } |
3.2.4.1 rlkProc | record linking process; repeat for each set of links |
3.2.4.1.2 fromRec | uses IDRefs to identify keyvar(s) used to create unique key in the 'from' record |
3.2.4.1.3 toRec | uses IDRefs to identify keyvar(s) used to create unique key in the 'to' record |
3.2.5 grdGrpQnty | number of grids in the file set |
3.2.6 grdNumTot | total number of grids in set |
3.2.7 grdLink | explains how to link grid sheets to create a larger cojoined grid; uses attributes blshtRefs, grdLinkRefs to identify ID of parts and ribbon { Y | N } and strip { Y | N } to indicate if they should be linked horizontally or vertically (to do both create all ribbons and then put together as strip) |
3.2.7.1 glkProc | used to define what should be chopped off to create single grid |
3.2.7.1.2 colChop | columns to chop off for making ribbons one element per sheet; attributes order {order of appending} and chopRng {columns to chop off} |
3.2.7.1.2 rowChop | rows to chop off for making strips one element per sheet; attributes order {order of appending} and chopRng {rows to chop off} |
3.3 locMap | used to map logical definitions for data items to physical storage description; refer to a specific logical data item using either attribute varRef OR nCubeRef {both are IDRef} |
3.3.1.1 CubeCoord | use to identify the logical coordinate location within a cube; use one for each dimension of the cube giving coordNo and coordVal or coordValRef {an IDRef to the var where the coordVal is obtained } |
3.3.1.2 physLoc | gives fileRef, recRef, blsRef, startPos, endPos, width, col, row, query, colCnv rowCnv { last two used when col or row information is derived from the value of a specific coordnate number } [dependent upon file type] [note: query should allow for indicating query type such as SQL]; this field can be repeated for each physical instance of a data set with the same logical data structure |
| Section name | Description of change |
|---|---|
4.1 varGrp | add attribute nCube { IDRefs } to allow varGrp to be used for identifying conceptually related nCubes; this should not be used to define nCubes with common dimensions because this can be done through the var description. Allows for the linking of contextually related items like an nCube on Norwegian Fishermen with an nCube on Norwegian Fishing Vessels |
4.2 var | [var is now also used to describe nCube dimensions, its use is consistant with mtxDmns descriptions in Wendy's unnested model.] add attributes: |
4.2.1 location | add attribute locMap { IDRef to the locMap (3.3) and indicating that the location information is there not here. } [note: var definitions that are used to describe dimensions should not use the location field. Location information link will be in nCube hierarchy. Ideally all location information will move to 3.3 so that multiple locations for the same data item can be listed within a single xml instance. This is simply a reroute for programmers in order to retain validity with version 1.0.] |
4.2.18 catgry | add attributes other { Y | N } and total { Y | N } indicates if the sum of all other catagories results in the contents of this category |
4.2.18.5 catgry | allows recursive nesting of catagories |
4.3 nCube | formerly known as Cube or varMtx it has been reborn as nCube under the VC ;-) new attributes: |
4.3.1 location | matches 4.2.1 to avoid problem of renaming. Within the nCube ONLY the attribute of locMap should be used. |
4.3.11 purpose | allows for PCDATA entry or URI link that explains the purpose for creating this specific aggregation. For example, a crosstabulation of specific age cohorts by specific percentages of poverty by race designed to meet the reporting requirements of a federal funding program. |
4.3.12 timeDmns | special type of dmns [see 4.3.13 for details] |
4.3.13 dmns | listed n times with the nCube, one for each dimension of the cube (AGE x SEX x RACE = 3 dimensions). Attributes: varRef { IDRef to var used to define dimension } and rank { indicates order of dimension providing cell coordinates. } For example:
<dmns varRef = "AGE" rank = "1"/> a cell with the coordinates of 3 2 3 would indicate the 3rd category in AGE, the 2nd category in SEX and the 3rd category in RACE. |
4.3.13.1 cohort | allows for selection of specific catagories within the dimension for inclusion using the catRef {IDRef } and value { to define a value to use in cell coordinate notation if needed } Use of cohort element implies only a subset of catagories will be used. |
4.3.13.1.1 range | allows for the selection of a range of catagories within the selected dimension. Nested in cohort as a specific type of category subset. [see 4.2.10.1 for range element description] |
4.3.14 measure | attributes of measurement pertaining to the nCube as a whole [see var for attribute definitions] |