Go to the previous, next section.
The following type ID construction algorithm is used for automatic generation of type IDs for all types and exceptions in ILU that have no explicitly specified type ID. The type ID is formed by (1) constructing a string containing all the salient information about the type or exception, in a standardized form, (2) taking the NIST Secure Hash Signature of that string, and (3) rendering the SHS in a string form as a 27-digit base-64 number.
There are some types that can be described in multiple ways, using more or less generalized constructions. For example, the primitive integer types could also be described as fixed-point constructions. Every type that can be described multiple ways is always considered to be the one that uses the least general constructs. In particular, the primitive integer types are indeed considered primitive -- not fixed-point constructions.
This string consists of 8-bit bytes. Many of those bytes are described here using characters, which are taken to be in the US-ASCII character code.
<info string> ::= <self ref> <defn>* <self ref> ::= <typeref> | <exnref>
The salient information string consists of a reference to the initial item
(the type or exception being characterized),
followed by the definitions of the interfaces, execeptions,
and non-primitive non-explicitly-IDed types referenced directly or indirectly by the initial item.
Each interface, exception, and non-primitive non-explicitly-IDed type that
is referenced in the type string is defined exactly once in the type
string, and these are the only definitions that appear in the type
string. In particular, the string includes no definition of any primitive type,
nor of any type whose ID is explicitly specified in the source language
(explicit IDs can optionally be given in ISL by appending a TYPEID
clause to a type definition;
in OMG IDL there is a defined ID for every named type).
The definitions appear in the order of first appearance of
the corresponding reference. In other words, the definitions appear in breadth-first order. In particular, if the initial item has a definition
(note that primitive types do not), then it is the first definition.
Every definition of an item is preceded by at least one reference to that item.
This forbids adding definitions of types in irrelevant cycles.
We could compress this a bit by making each definition immediately follow, and thus not duplicate, its first reference. But this gains little, and type fingerprints are not computed at runtime anyway. And it has this cost: it tempts an implementor to use a recursive strategy for generating the type string, and a recursive implementation will crash with a stack overflow when presented with hostile interface source code. The structure given is amenable to a simple worklist-based generation strategy.
A type reference is a reference to either a primitive type or a constructed one, or an explicit type UID (if the type UID has been explicitly specified).
<typeref> ::= <primitive> | <typeciteref> | <typeidref> <typeciteref> ::= "(ref" <sp> <type cite> ")" <typeidref> ::= "(id" <sp> <UID string> ")" <UID string> ::= <string>
The primitive types are:
<primitive> ::= "byte" | "shortcardinal" | "cardinal" | "longcardinal" | "shortinteger" | "integer" | "longinteger" | "shortreal" | "real" | "longreal" | "shortcharacter" | "character" | "longcharacter" | "boolean" | "pickle" | "void"
A reference to a constructed type cites two things: (1) the name of the interface in which the type is defined, and (2) the name given that type. These are ultimately <id>s, as defined in ISL; ISL identifiers may not contain any characters (e.g., quotes, parentheses) that are treated specially in the syntax of salient information strings. In a salient information string, <id>s are represented in US-ASCII (with no leading length nor trailing null).
<type cite> ::= <ifc ref> <sp> <typename> <ifc ref> ::= <interfacename> <interfacename> ::= <id> <typename> ::= <id> <sp> ::= a single space character (i.e., code 32(decimal))
Exception references similarly have two forms, depending on whether an ID has been explicitly specified:
<exnref> ::= "(exn" <sp> ( <exnciteref> | <exnidref> ) ")" <exnidref> ::= "(id" <sp> <UID string> ")" <exnciteref> ::= "(ref" <sp> <ifc ref> <sp> <exnname> ")" <exn cite> ::= <ifc ref> <sp> <exnname> <exnname> ::= <id>
There are three kinds of things that need to be defined: types, interfaces, and exceptions.
<defn> ::= <type def> | <ifc def> | <exn def>
A definition of a type gives its compound name, brand, and what the type is defined to be.
<type def> ::= "(type" <sp> <type cite> <sp> <brand> <sp> <type desc> ")"
A type can be defined either to be a primitive or a construction.
<type desc> ::= <primitive> | <construction>
Constructed types have different forms, specific to their types.
<construction> ::= <fixedpoint> | <stringt> | <array> | <record> | <optional> | <sequence> | <union> | <enumeration> | <object>
For alias types that have an explicitly assigned type ID, we simply generate a <typeidref> when it's referenced. For alias types that have the same type ID as their base type, we just use the base type in its place when constructing type info strings.
This should be mostly obvious. An integer is given as its shortest decimal representation; positive integers have no sign character.
<array> ::= "(array" <sp> <typeref> (<sp> <dim>)+ ")" <dim> ::= "(fixed" <sp> <non-negative integer> ")" <integer> ::= "0" | ( ["-"] <non-zero-digit> <digit>* ) <digit> ::= <non-zero-digit> | "0" <non-zero-digit> = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
The field names are included on the grounds that they give clues to the semantics of the type, so different field names make different types.
<record> ::= "(record" (<sp> <rfield>)+ ")" <rfield> ::= "(field" <sp> <fieldname> <sp> <typeref> ")" <fieldname> ::= <id>
<optional> ::= "(optional" <sp> <typeref> ")"
The fixed-point constructor gives the minimum and maximum numerators, and the fixed denominator value. Note that <integer> may be arbitrarily long.
<fixedpoint> ::= "(fixedpoint" <sp> <min-num> <sp> <max-num> <sp> <denom> ")" <min-num> ::= <integer> <max-num> ::= <integer> <denom> ::= <positive integer> | "1/" <positive integer>
<sequence> ::= "(sequence" <sp> <typeref> <sp> "(variable" <sp> <limit> "))" <limit> ::= <integer>
There is always some finite limit, 2^32-1 when not explicitly given in ISL.
<union> ::= "(union" <sp> <typeref> (<sp> <uarm>)+ ")" <uarm> ::= "(arm" <sp> <typeref> <sp> [ "(name" <arm name> ")" <sp> ] "(" ["default"] ")" (<sp> <discvalue>)* ")" <arm name> ::= <id> <discvalue> ::= "(val" <sp> ( <string> | <integer> | ( "TRUE" | "FALSE" ) ) ")"
The <typeref> at the top level of a UNION constructor is the type of the tag, which must be an integer type, a character type, an enumerated type, or boolean. The <discvalue> is an integer when the tag is a numeric type, a string when the tag is an enumerated or character type, and a boolean literal when the tag is boolean.
<enumeration> ::= "(enumeration" (<sp> <efield>)+ ")" <efield> ::= "(element" <sp> <id> <sp> <integer> ")"
Enumeration info includes the numeric codes assigned when not explicitly given.
<object> ::= "(object" [ <sp> "(singleton" <sp> <singletoninfo> ")" ] [ <sp> "optional" ] [ <sp> "collectible" ] ( <sp> "(supertype" <sp> <typeref> ")" )* ( <sp> <method> )* <method> ::= "(method" <sp> <methodname> [ <sp> "asynchronous" ] [ <sp> "functional" ] <sp> "(returns" <sp> <typeref> (<sp> <exnref>)* ")" ( <sp> <marg> )* ")" <marg> ::= "(parameter" <sp> <name> <sp> ( "in" | "out" | "inout" ) <sp> <typeref> [ <sp> "sibling" ] ")"
The form of object type constructions follows the features of the ISL fairly closely. A method with no explicit return result is considered here to return "void". The methods listed are only those introduced at the object type at hand -- the inherited methods are not listed.
<singletoninfo> ::= <string>
The <singletoninfo> is the string given in the ISL (quoted, as all <string>s are).
The definition of an interface simply gives its brand. We *could* make it also include a reference to everything defined in that interface, but I don't see any particular value in doing so.
<ifc def> ::= "(interface" <sp> <interfacename> <sp> <interfacebrand> ")" <interfacebrand> ::= <string>
The definition of an exception gives its compound name, its brand, and the associated datatype. As exceptions are not yet branded in the rest of ILU, the brand here is always the empty string. If the exception has no associated datatype, it is given as "void" here.
<exn def> ::= "(exception" <sp> <exn cite> <sp> <exnbrand> <sp> <typeref> ")" <exnbrand> ::= <string>
<brand> ::= <string>
The string brand for a type, exception, or interface. If not explicitly given in the ISL, the empty string.
A <string> is encoded using US-ASCII and surrounded with double-quotes (US-ASCII code 34). If the double-quote character appears in the string, it is escaped with a leading backslash character. If a backslash character appears in the string, it is escaped with an additional leading backslash character. If a non-printing character (with a US-ASCII code less than 32(decimal) or greater than 126(decimal)) appears in the string, it appears as a 3-digit octal number containing the US-ASCII code for that character, preceded with a backslash character.
<string> ::= <dquote> <string-char>* <dquote> <dquote> ::= the US-ASCII double-quote character (i.e., code 34(decimal)) <string-char> ::= <simple-char> | <quoted-char> <simple-char> ::= any printing US-ASCII character (i.e., codes 32(decimal) to 126(decimal)), except double-quote (code 34(decimal)) or backslash (code 92(decimal)) <quoted-char> ::= <quoted-backslash> | <quoted-double-quote> | <quoted-byte> <quoted-backslash> ::= <backslash> <backslash> <backslash> ::= the US-ASCII backslash character (i.e., code 92(decimal)) <quoted-double-quote> ::= <backslash> <dquote> <quoted-byte> ::= <backslash> <octal-digit> <octal-digit> <octal-digit> <octal-digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
Consider the following interface:
INTERFACE Foo BRAND "13.2.116.14 October 26, 1998"; EXCEPTION E; TYPE C1 = FIXEDPOINT MIN-NUMERATOR 0 MAX-NUMERATOR 59 DENOMINATOR 1; TYPE I1 = FIXEDPOINT MIN-NUMERATOR -2147483648 MAX-NUMERATOR 2147483647 DENOMINATOR 1 ; TYPE O-1 = OBJECT METHODS m1(o:O-2, c:C1):I1 END; TYPE O-2 = OBJECT SUPERTYPES O-1 END METHODS m2() RAISES E END END BRAND "13.2.116.14 October 26, 1998" ; TYPE O-1X = O-1; TYPE O-1Y = O-1X TYPEID "xyz:bad-idea";
The salient information strings for types O-1 and O-1X are the same. That string is:
(ref Foo O-1) (interface Foo "13.2.116.14 October 26, 1998") (type Foo O-1 "" (object (method m1 (returns integer) (parameter o in (ref Foo O-2) ) (parameter c in (ref Foo C1))))) (type Foo O-2 "13.2.116.14 October 26, 1998" (object (supertype (ref Foo O-1)) ( method m2 (returns void (exn (ref Foo E)))))) (type Foo C1 "" (fixedpoint 0 59 1)) (exception Foo E "" void)
Line breaks have been introduced into the display above for readability; the actual salient information string contains none of those linebreaks nor any whitespace of any kind at their positions.
The type ID (not salient information string) for O-1Y is "xyz:bad-idea".
See FIPS PUB 180-1: Secure Hash Standard.
Abstract:
"This standard specifies a Secure Hash Algorithm (SHA-1) which can be used to generate a condensed
representation of a message called a message digest. The SHA-1 is required for use with the Digital
Signature Algorithm (DSA) as specified in the Digital Signature Standard (DSS) and whenever a secure hash
algorithm is required for Federal applications. The SHA-1 is used by both the transmitter and intended
receiver of a message in computing and verifying a digital signature."
The SHS is converted to a string of length 32 bytes, each containing an ASCII character code. First, two 0 bits are appended to the 160-bit SHS. This 162-bit string is encoded as 27 6-bit digits, most significant digit first. The encoding uses characters 'a' - 'z' to represent the values 0-25, the characters 'A' - 'Z' to represent the characters 26-51, the characters '0' - '9' to represent the values 52-61, the character '-' to represent the value 62, and the character '+' to represent the value 63. Finally the 5 bytes "ilut:" are prepended, giving the total of 32 bytes.
Go to the previous, next section.