>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  Dictionary

Symbol s:  _ _ _ Symbol

Symbols are a new data type and are created by the verb s:. Symbols provide a mechanism for searching, sorting, and comparisons more efficient than alternative mechanisms such as boxed strings. Existing structural, selection, and relational verbs are extended to work on symbols. Arithmetic verbs do not work on symbols.

The monad s: produces an array of symbols. Several types of arguments are acceptable:
  • string with the leading character as the separator
  • literal array where each row, excluding trailing blanks, is the name of a symbol
  • array of boxed strings
s:^:_1, the inverse of s:, is 5&s: .

The dyad s: takes a scalar integer left argument and computes a variety of functions:

Left   Right     Function
 0 0 the cardinality of the set of symbols
 0 1 the string length (the number of characters used in the string table)
 0 2 the table of symbols; the columns are:
 0index in the string table
 1length in bytes
 2hash value
 3color
 4parent
 5left
 6right
 7order #
 8predecessor
 9successor
 10 bit flags
The details of this data may change from one version of J to the next.
 0 3 the string table
 0 4 the hash table. _1 indicates free entries; non-negative values are indices into the table of symbols.
 0 5 the binary tree root
 0 6 the binary tree fill factor
 0 7 the binary tree gap
 0 10 get the global symbols data, equivalent to 0 s:&.>i.8. The details of this data may change from one version of J to the next.
 0 11 perform an integrity check on the global symbols data
 0 12 the number of queries required for each symbol
 1 array of symbols a string of the symbol names each prefaced by a leading '`'
_1 string the symbols list for a string containing symbol names each prefaced by the leading character
 2 array of symbols a string of the symbol names each suffixed by a trailing zero character
_2 string the symbols list for a string containing symbol names each suffixed by the trailing character
 3 array of symbols a literal array of the symbol names padded with zero characters
_3 literal array the symbols array for the literal array wherein each row, excluding trailing zero characters, is the name of a symbol
 4 array of symbols a literal array of the symbol names padded with blanks
_4 literal array the symbols array for the literal array wherein each row, excluding trailing blanks, is the name of a symbol
 5 array of symbols an array of boxed strings of the symbol names
_5 boxed strings the symbols array for the boxed array wherein each box is a string of a symbol name
 6 array of symbols an integer array of the symbol indices (indices into the table of symbols)
_6 indices the symbols for the indices
 7 array of symbols an integer array of the order numbers for the symbols
10 global symbols data set the global symbols data (as previously returned by 0 s: 10) after performing an integrity check on it. Incorrect global symbols data may cause misinterpretation of symbol arrays, or data corruption, or a system crash, or the end of civilization as we know it.

The inverse of k&s: is (-k)&s:, for non-zero integer k between _6 and 6.
 

The remainder of this text is divided into the following sections: Display, Annotated Examples, Space and Time, and Persistence.

Display

The display of a symbol is the character ` (96{a.) prefaced to the symbol name; the display of a symbol array is similar to that display of numeric arrays, except that columns are aligned on the left. See Annotated Examples below.

Annotated Examples
   ] t=: s: ' zero one two three four five'
`zero `one `two `three `four `five

   $ t                              a list of 6 symbols
6
   3 5 $ t                          a matrix of symbols
`zero `one  `two  `three `four 
`five `zero `one  `two   `three
`four `five `zero `one   `two  

   1 3 5 3 1 { t
`one `three `five `three `one
   |. t
`five `four `three `two `one `zero
   _2 |. t
`four `five `zero `one `two `three
   1 0 2 0 4 0 # t
`zero `two `two `four `four `four `four

   <"0 t                            symbols can be boxed
+-----+----+----+------+-----+-----+
|`zero|`one|`two|`three|`four|`five|
+-----+----+----+------+-----+-----+
   (2|i.#t) </. t
+----------------+-----------------+
|`zero `two `four|`one `three `five|
+----------------+-----------------+

   <:/~ t                           relations work on symbols
1 0 0 0 0 0
1 1 1 1 0 0
1 0 1 0 0 0
1 0 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1

   t + t                            arithmetic functions don't work on symbols
|domain error
|   t    +t

   /: t                             symbols can be graded/sorted
5 4 1 3 2 0

   5 s: t                           convert symbols to boxed strings
+----+---+---+-----+----+----+
|zero|one|two|three|four|five|
+----+---+---+-----+----+----+
   (/: t) -: /: 5 s: t
1

   /:~ t
`five `four `one `three `two `zero

   <:/~ /:~ t
1 1 1 1 1 1
0 1 1 1 1 1
0 0 1 1 1 1
0 0 0 1 1 1
0 0 0 0 1 1
0 0 0 0 0 1

   t i.  s: ' three one four one five nine'
3 1 4 1 5 6
   t e.~ s: ' three one four one five nine'
1 1 1 1 1 0

   10{. t                           the fill for symbols is the 0-length symbol
`zero `one `two `three `four `five ` ` ` `

   _10{.t
` ` ` ` `zero `one `two `three `four `five

   0 s: 0                           cardinality (current # of unique symbols)
8

   a=:   ;:'A AAPL AMAT AMD AMZN ATT BA CRA CSCO DELL F GE GM HWP IBM INTC'
   a=: a,;:'JDSU LLY LU MOT MSFT NOK NT PFE PG QCOM RMBS T XRX YHOO'
   b=: ;:'NY SF LDN TOK HK FF TOR'
   c=: ;:'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'
   d=: <;._1 ' 00 01 02 03 04 05 06 07 08 09'
   e=: ;:'open high low close'

   t=: }.@;&.>{' ',&.>&.>a;b;c;d;<e
   $t
30 7 12 10 4
   */ $t
100800
   2 4 ($,) t
+----------------+----------------+---------------+-----------------+
|A NY Jan 00 open|A NY Jan 00 high|A NY Jan 00 low|A NY Jan 00 close|
+----------------+----------------+---------------+-----------------+
|A NY Jan 01 open|A NY Jan 01 high|A NY Jan 01 low|A NY Jan 01 close|
+----------------+----------------+---------------+-----------------+
   y=: s: t                         create a whole lot of symbols
   $y
30 7 12 10 4
   2 4 ($,) y
`A NY Jan 00 open `A NY Jan 00 high `A NY Jan 00 low `A NY Jan 00 close
`A NY Jan 01 open `A NY Jan 01 high `A NY Jan 01 low `A NY Jan 01 close

   0 s: 11                          system integrity check
1
   0 s: 0                           cardinality
100808

   (+/ % #) 0 s: 12                 mean # of queries per symbol
1.31213

   h=: 100808 {. 2 {"1 ] 0 s: 2     hash values

   (+/ ~: h) % #h                   fraction of distinct hash values
0.999821
   (+/ ~: h |~ #0 s: 4) % #h        fraction with respect to hash table
0.831005
Space and Time

In the current implementation, a symbol y requires 4 bytes for an index, 8 or more bytes in the hash table, 44 bytes in the table of symbols, and len y bytes (times 2 if Unicode) in the string table, where len=: #&>@(5&s:) , the length of the symbol name. (A symbol requires a single 4-byte entry in the hash table, but for efficient hashing the system maintains at least 2*n entries for n symbols.) Multiple occurrences of a symbol require just multiple indices; entries in the hash table, the table of symbols, and the string table are not duplicated.

Computations on symbols generally require linear time. Specifically:
  query (new)        O((len y) * ^. 0 s: 0)
query (old) O(len y)
/:yO(*/$y)
i{yO((*/$i) * */}.$y)
x < y etc.O(x >.&(*/@$) y)
x i. yO(x + &(*/@$) y)

Persistence

The interpretation of symbols depend on the global symbols data 0 s: 10. For this interpretation to persist across J sessions the global symbols data must be restored at the beginning of a session. Thus:

((3!:1) 0 s: 10) 1!:2 <'symb.dat'         to    store the global symbols data
10 s: (3!:2) 1!:1 <'symb.dat' to restore the global symbols data

See the cautionary statements under 10 s: x.




>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  Dictionary