Add Chen psort algorithm.
------
Consider adding options for:
	 circled and parenthesized characters				DONE (but add to docs)
	 stylistic equivalents (various presentation forms)		DONE (but add to docs)

	 think about whether I should provide a "skip" option that would take
	       arguments specifying what to skip in terms of posix-like classes, e.g.:
	       --ignore Alnum	    ignore characters not in [:alnum:]
	       --ignore blank	    ignore [:blank:] characters

	       categories to provide:

	       POSIX
	       	       alpha
	       	       digit
	       	       xdigit
	       	       upper
	       	       lower
		       punct
	       	       graph
	       	       print
	       	       space
	       	       blank
	       Extensions
		       word (alnum + underscore)

	       Equivalence classes of Unicode general character properties

Probably use libucd to do this.
Probably should make this a compile-time option as the Unicode database info will
add considerably to the size of msort.
Also allow reference to Unicode blocks, so that one can say, for example, to ignore
non-Thai characters.
------
Extract the core sorting functions into a library, which will be
used by the application-level shell. This will allow other
programs to make use of sophisticated sorting. It will also
probably clean up the code.

What should the API look like? Arguments should be:

(a) a list of records, where a record is of type wchart_t *
    query: how to combine this with ability to read from stream?
    maybe have the calling program supply a generator which produces on each
    call the next input record. Then, if the input data is internal this would
    be trivial, whereas if the data has to be read from a file, it would do the
    file i/o and parsing.
(b) the number of keys on which to sort
(c) the field separator 
(d) an array of structs containing the sort parameters   - approximately current keyinfo array
    key selection type
    field tag | field number | character range
    optionalP
    option handling
    key type
    pointer to sort order spec
    exclusion list
    substitution list
    date format
    reverseP
    invertP 

It would be the job of the calling program to parse the input into records
if necessary (it wouldn't always be since it might be constructing them itself)
and to read in, if necessary, the sort order, exclusions, etc. Again, some or all
of these might be hard-coded into the program or computed rather than read from files.

Revised notes:
API (a)
    list of keyspecs
    list of records already parsed and transformed - just uses specialized comparison routine

API (b)
    list of keyspecs
    list of unparsed records

    uses key extraction and transformation code as well as specialized comparison routine
    
----
To make hybrid comparison more efficient, we need to get it out of
Compare and preprocess it. I think that the way to do it is
to scan for integer strings as we process keys. When we find
an integer string, replace it with a special code (some unused
Unicode value) that will serve as a marker, compute the
value of the string, and attach it to a linked list of
such values. Then in compare we procede as with an ordinary
lexicographic comparison until we encounter the
special character, for which we then defer to the stored
integer values.

I'm not sure this is necessary. It seems to be pretty fast
as is. 
---
Since Unicode actually uses only 21 bits, the high bits of the rank
table could be used for other purposes, e.g. exclusions.
This would save a good bit of memory at the expense of a little masking.
Think about it.
---
Consider adding the ability to parse out records on a sequence of
two or more tokens of any specified character as opposed to just LF.

Consider adding the ability to treat the input as raw bytes,
thereby allowing binary data to be sorted and restoring the
ability to deal with extended ASCII encodings, such as ISO-8859.
This would be pretty straightforward but would probably require
a lot of duplication of code. Note that null bytes in binary
data can be accomodated by mapping the null byte to a codepoint
above 0xFF and mapping it back on output, effectively treating it as a
multigraph.

Perhaps add to manual a discussion of the things that can be done with
very large numbers of long multigraphs, which can be used to index
records of other types.

Consider creating a secure version, to minimize exposure of plain text
to surveillance. It would read and write encrypted text and
store text internally, to the extent possible, in encrypted form.
In the extreme case (perhaps a separate option since it is likely very
time-consuming), keys, once extracted, would be encrypted, to be
decrypted momentarily by the comparison subroutine.

Maybe I should use hash tables to store the ranks. That will require much less memory
(except in the unlikely case in which the input spans a large subset of the code space),
at the cost of making the program run more slowly. If I do this, probably should do it
only for the larger codespaces. It may be worth retaining array lookup for single
byte encodings. The hash size could perhaps be chosen heuristically based on the
Unicode range of the input. We'd just have to have a table of the sizes of the various
ranges and/or a hash size computed from this. We might also want to make the hashsize
a command-line option so that it can be tuned for large inputs.

If I use hash tables rather than arrays, it isn't clear how to handle default
ranks. Hash tables work fine if we require everything to be listed in the sort
order file and can do something like assigning the lowest rank to anything not
listed, but if we want Unicode order to be the default, how do we arrange that?

Ah, I think I see. When we read a character that isn't in the rank table,
we assign it a rank of (UnicodeValue + SizeOfRankTable). This will guarantee
machine collating order, and the offset will ensure that all such characters
sort after anything in the sort order file.

Using hash lookup may be a bit of a time hit, so another approach would be to
use the ranks themselves as the key, thus avoiding indirection. We'd still need
the hash tables, but just while processing the input. During the actual
sort, we'd be able to use the keys directly.

I did this once a long time ago and ran into problems. I don't recall whether
they were intrinsic or not. Try to remember. The one problem I can see with storing
the ranks themselves is that we couldn't have a rank zero if we want to use
string functions on the keys. I think that would be okay, but it might require some
adjustment.

