                         P O R T I N G . T X T
                         == How to port MCPP ==

                Kiyoshi Matsui      kmatsui@t3.rim.or.jp

V.2.0   1998/08     First released.                             kmatsui
V.2.1   1998/09     Updated according to C99 1998/08 draft.     kmatsui
V.2.2   1998/11     Updated according to C++98 Standard.        kmatsui
V.2.3 pre-release 1     2002/08     Updated according to C99 Standard.
                    Ported to Linux/ GNU C, CYGWIN and LCC-WIN32.
                    GNU C-compatible features augmented.        kmatsui
V.2.3 pre-release 2     2002/12     Ported to GNU C V.3.2.
                    Revised some wording.                       kmatsui
V.2.3 release       2003/02     Finally released.               kmatsui
V.2.3 patch 1       2003/03     Slightly modified.              kmatsui
V.2.4 prerelease    2003/11     Added porting to Visual C++.
            Created configure script.                           kmatsui
V.2.4 release       2004/02     Extended multi-byte character handling.
            Added porting to Plan 9 / pcc.
                                                                kmatsui
V.2.4.1     2004/03     Revised recursive macro expansion, and added -c
                option.
                                                                kmatsui


                                Contents

1   Overview
1.1     High portability
1.2     Standard mode with highest conformance and other modes

2   History

3   How to port MCPP to different compiler systems : Overview
3.1     Already supported compiler systems
3.1.1       Common configurations
3.1.2       FreeBSD / GNU C V.2.*
3.1.3       Linux / GNU C V.2.*
3.1.4       FreeBSD, Linux / GNU C V.3.*
3.1.5       CygWIN V.1.* / GNU C V.2.*
3.1.6       DJGPP V.1.*
3.1.7       LCC-Win32 V.3.*
3.1.8       Visual C++ .net
3.1.9       Borland C V.4.*, V.5.*
3.1.10      LSI C-86 V.3.3
3.1.10.1        Compile MCPP for LSI C-86 by Borland C
3.1.11      Plan 9 ed.4 / pcc
3.2     Compiler systems to which DECUS cpp had been ported
3.3     noconfig.H, configed.H, system.H
3.4     system.c, mbchar.c
3.5     lib.c
3.6     Standard headers
3.7     Makefile and recompile using MCPP
3.7.1      Compile by Plan 9 / pcc
3.8     Compiler systems which can compile MCPP
3.9     Host compiler system and target compiler system
3.10    Unsupported compiler systems
3.11    Memory model of MS-DOS

4   How to port MCPP to different compiler systems : Details
4.1     Configuration of noconfig.H, configed.H, system.H
4.1.1       PART 1 Configuration of Target system
4.1.1.1         Predefined macros
4.1.1.2         Include directories and others
4.1.1.3         Output specifications of line number information and
                        others
4.1.1.4         Configuration corresponding to the compiler-system's
                        language specification
4.1.1.5         Multi-byte character
4.1.1.6         Target and host system common configurations
4.1.2       PART 2 Configuration of Host system
4.1.3       PART 3 Configuration of the MCPP behavior specification
4.1.3.1         Selection of various new and old mode
4.1.3.2         Specifying the details of the behavioral mode
4.1.3.3         Special Configuration
4.1.3.4         Configuration of translation limits
4.2     system.c
4.3     mbchar.c
4.4     lib.c
4.extra malloc()

5   Bug reporting and porting report
5.1     Is this a bug?
5.2     Check for malloc() related bugs
5.3     Bug report
5.4     Porting report
5.5     Information about Configure for other compiler systems besides
                GNU C
5.6     I will try to port if you send me the data.
5.7     Please report the test of other compiler systems by the
                Validation Suite.
5.8     The feed back for improvement

6   Long way to MCPP
6.1     Three days to plan and six years to develop
6.2     V.2.3
6.3     Selected to "Exploratory Software Project"


                              1   Overview

MCPP is a C preprocessor written by kmatsui (Kiyoshi Matsui).  MCPP
stands for 'Matsui CPP', it has been developped focused on C Standard
conformance.  This project was started with the code of "DECUS CPP" by
Martin Minow.  It has been modified, then, finally re-written entirely.
It is supplied as source code, and in order for it to be used with
various compiler systems, the MCPP executable needs to be generated by
compiling.  This is done after adding some modifications to the source,
as suitable for the compiler system.

This document explains how to port the source to different compiler
systems.  Please refer to the separate manual called "manual.txt" for
the operating instructions of the generated executable.

These sources and related documents are all provided as free software.

MCPP has the following features. (This section 1.1 - 1.2 is a duplicate
of the one in manual.txt).

  * The MCPP executable is to be used by replacing the resident
    preprocessor of each compiler system.  Hence, the name of the
    executable program becomes the name of the compiler system's
    preprocessor.  It is cpp in most of the systems.


1.1     High portablity

MCPP is portable.  It supports various operating system, including GNU/
Linux, DOS/Windows.  It's source code is also portable.  It can be
compiled by compilers which support Standard C or C++ (ANSI/ISO C or C++)
as well as ancient ones which only support K&R 1st.

The library functions used are only the classic ones.  The C source is
also attached for some of the library functions.  Therefore, there are
not the annoying problems like: the Standard C preprocessor cannot be
compiled because there is not the compiler system for Standard C, or
some parts of the compiler system are not Standard C conformant.

To port to each compiler system, in many cases, one only needs to change
some macro definitions in the header files and simply compile it.  Even
in the worst case, several dozens of lines need to be added into a
source file called system.c.

As the MCPP object runs using memory efficiently, even a 16-bit system,
whose address space is small, can be used (there are, however,
considerable restrictions).

To process multi-byte characters (Kanji), it supports Japanese EUC-JP
and shift-JIS, Chinese GB-2312, Taiwanese Big-5 and Korean KSC-5601 (KSX
1001).  The systems of 32-bits or more can use ISO-2022-JP and UTF-8 as
well.  For shift-JIS or Big-5, MCPP can complement the compiler-proper
if it does not recognize them.


1.2     Standard mode with highest conformance and other modes

By modifying macros in the header file system.H, when compiling MCPP
itself, the various behavior modes of the preprocessor are generated.

Of course, there is Standard C mode, but it can also make various modes
such as K&R 1st mode or "Reiser" model cpp mode.  Furthermore, there is
what I call "post-Standard mode".  Also, Standard C mode has an
execution time option which behaves as a C++ preprocessor.

Different to many of the existing preprocessors, Standard C mode should
cover the Standards completely.  It conforms to ISO/IEC 9899:1990, its
Corrigendum 1:1994 and its Amendment 1:1995.  It is also conformant to
C99 (ISO/IEC 9899:1999).  It has been created with the aim of being the
reference model of the Standard C preprocessor.

Even if the compiler proper is not conformant to Standard C, anything
which can be dealt with by the preprocessor is implemented.  For the
compiler proper which does not concatenate adjoining string literals,
MCPP can be made to process these.

It also has some useful enhanced features.  There is the #pragma
__debug_cpp which can trace the macro expansion mechanism, and trace the
#if expression evaluation mechanism.  Also, header files can be pre-
preprocessed.

The various modes above are selected at MCPP compile time, not at
execution time.  This is to reduce the size of MCPP and to avoid
complicated options at execution time.

However, there are also some useful execution time options.  These are
for specifying the level of warnings or to specify the "include
directory".

Even if there are any mistakes in the source, MCPP deals suitably with
accurate plain diagnostic messages without running out of control or
displaying misguiding error messages.  It also displays warnings for
portability problems.  The detailed documents are also attached.

If I have to raise a weak point of MCPP, the speed is a little slow.
Compared to GNU C/cpp, it takes twice to three times more time.  But it
is about the same speed as the Borland C 5.5/ cpp.  As it becomes a
little quicker when the pre-preprocess facility of the header file is
used, it is not especially slow.  I consider this level of processing
speed to be unavoidable for being accurate, portable source and to run
in less memory.

In addition, with MCPP, I have also released a validation suite which is
to test the Standard C conformance of the preprocessor - "Validation
Suite for Standard C Preprocessing", its explanatory notes and a
document - cpp-test.txt.  This contains the scorebook for various
preprocessors, tested by this suite.  If you see this, you would
understand how many problems exist in the existing so called "Standard
conformant" preprocessors.

As a Standard of the C language, ISO/IEC 9899:1990 (JIS X 3010-1993) had
been used in the past, but ISO/IEC 9899:1999 was adopted in 1999.  In
this document, the former is referred to as C90 and the latter as C99.
The former has shifted from ANSI x3.159-1989, generally it can be
referred to as ANSI C or C89.  Also, ISO/IEC 9899:1990 + Amendment 1995
can be also referred to as C95.


                              2   History

2.1     DECUS cpp was created by Martin Minow, and released in usenet/
net.sources on May 1984.  Apparently, DECUS is an acronym for
"DEC User's Society" which is a user group of DEC (Digital Equipment
Corporation).  DECUS cpp is the C preprocessor written for DEC's C
language compiler systems of those days, such as PDP-11 / RT11, PDP-11 /
RSX,  VAX / VMS, VAX / ULTRIX.  As it had been written well for
portability, it was quite easy to port to other systems.  Even the
original version had already been ported to some other UNIX systems
besides DEC's.

2.2     I used the distribution No.243, of the C Users' Group, to be the
base for this version.  According to the revision history of this source,
the original author's final modification was June 1985.  I do not know
if the author has upgraded it since then.

2.3     After that, some people ported it to some of the compiler
systems on MS-DOS until December 1988.  This is the version which is
included in the CUG disc.

2.4     There are also sources in ftp.ora.com/pub/examples/nutshell/
imake/DECUS-cpp.tar.gz.  The time stamp of this shows Feb 1993, but the
actual contents are older than CUG's and it is Jan 1985's.  According to
the README of Martin Minow which was included there, this program is
stated to be "public domain". (This README also seems to be 1984 or 1985
's.)

2.5     The one ported to Microware C of OS-9/6x09, by Gigo & others,
had been registered in NIFTY-SERVE / FOS9 / lib 2.

2.6     MCPP V.2 is based on these and I re-wrote it entirely.  I
improved the portability further, changed the method of partitioning of
source files to completely comply with the Standard C, added lots of
macros and drastic addition/separation/rewriting/renaming of functions
and variables has been done.  The size of the source is three times that
of the original version.  All the documents and the Validation Suite are
written completely new by me.
I will release these as free software.  I do not have any relationship
with DECUS.
The original version does not have a version number, but I refer to them
as "DECUS cpp" or "old version" to differentiate them from MCPP.

2.7     For the algorithm of macro expansion for Standard C, the source
of CPP V.5.3 (Aug 1989, CUG #319) - PDS on MS-DOS by E.Ream, was also
referred to.  Additionally, I took some hints from the behavior of GNU C
/cpp and J. Roskind's JPCPP document.

2.8     MCPP V.2.0 was released with Validation Suite V.1.0 on NIFTY
SERVE / FC / LIB2 in August 1998, and also re-distributed on Vector's
web site.

2.9     MCPP V.2.1 was a revised V.2.0 according to the C99 1998/08
draft.  In September 1998, this had been uploaded with Validation Suite
V.1.1 to NIFTY SERVE / FC / LIB2 and at the same time to Vector's web
site.

2.10    MCPP V.2.2 was an updated V.2.1 according to the C++ Standard
(ISO/ IEC 14882:1998), which was adopted on July 1998.  Some bug fixes
had been added.  With the Validation Suite, this had been uploaded to
NIFTY SERVE / FC / LIB2 and at the same time to Vector's web site in
November 1998.

2.11    MCPP V.2.3 was an updated V.2.2 according to C99.  Added porting
to Linux / GNU C 2, GNU C 3, etc., and augmented the compatibility with
GNU C/cpp.  Also, the execution time options are added and some options
were changed.  Some bug fixes and a supplement/update of documents were
also added.  In V.2.3, English versions of the documents are also
created.  In the Validation Suite attached to MCPP, an edition which
allows automatic testing as a part of GNU C / testsuite is added.

2.12    In the middle of development of V.2.3, MCPP with Validation
Suite V.1.3. was selected for the 2002 "Exploratory Software Project" of
the Information-Technology Promotion Agency, Japan (IPA) by the project
manager, Yutaka Niibe.  During the period of July 2002 - Feb 2003, the
development was progressed by the IPA's funding, and based on Niibe's
advice.  The documents are consigned to "HighWell inc." (Tokyo) for
translation to an English version, and completed with my modifications.
During this project, "cvs repository" and "ftp site" were prepared.  V.2.
3 was developed with pre-release 1 in August 2002, pre-release 2 in
December 2002, and then the released version in February 2003.  Since
then, V.2.3 patch 1 has been released in March 2003. *

2.13    MCPP has continued to be selected as the "Exploratory Software
Project" for 2003 by the project manager, Hiroshi Ichiji.  During the
period of June 2003 - Feb 2004, the update to V.2.4. was proceeded with
the IPA's funding, and based on Mr.Ichiji's advice.  In this project, V.
2.4 pre-release was developed in November 2003.  In this version, the
porting to Visual C++ .net is added, and also a configuration script to
automate 'make' of MCPP was created.  Also, MCPP did not have a clear
license indication, but a BSD style license has been included from this
version.  Furthermore, the release version was developed in February
2004.  In this version, the processing of multi-byte characters are
enhanced, and it was also ported to Plan 9 / pcc.  The documents were
consigned to HighWell for translation to an English version, as they
were updated from the Japanese version.

2.14    In March 2004, MCPP V.2.4.1 was released.  This version revises
the recursive macro expansion.

  * The outline of the "Exploratory Software Project" can be seen at the
    following site.

        http://www.ipa.go.jp/jinzai/esp/

    The source code and documentation of MCPP and Validation Suite
    including the developing revision are located at the following CVS
    repository.  You should be able to download a tar-ball from the
    following site.

        http://cvs.m17n.org/cgi-bin/viewcvs/?cvsroot=matsui-cpp

    These new releases are also available by anonymous ftp on the
    following site.

        ftp://ftp.m17n.org/pub/mcpp/

    Here is a web page about MCPP.

        http://www.m17n.org/mcpp/

    MCPP V.2.2 and Validation Suite V.1.2 are also located in the
    following Vector's web site.  It was included in the CD-ROM
    "PACK for WIN GOLD" as well.  They are in the directory called dos/
    prog/c, but they are not for MS-DOS exclusively.  Sources are
    available for UNIX, WIN32, MS-DOS9.

        http://download.vector.co.jp/pack/dos/prog/c/cpp22src.lzh
        http://download.vector.co.jp/pack/dos/prog/c/cpp22bin.lzh
        http://download.vector.co.jp/pack/dos/prog/c/cpp12tst.lzh

        http://download.vector.co.jp/
    and
        ftp://ftp.vector.co.jp/
    seem to be the same.

    For DOS/Windows systems, these text files within the archive files
    on Vector, are encoded with <newline> as [CR] + [LF] and Kanji
    characters as shift-JIS.  For Unix systems, those in m17n.org are
    encoded with <newline> as [LF] and Kanji characters as EUC-JP.  If
    used by other OSs, a conversion will be needed.  If the tool that I
    created, convf, is used then all files can be processed easily in
    one operation. (Binary files are automatically recognized and copied
    without conversion.  Time stamp and mode are preserved.)  However,
    the MCPP package contains files to test a specific multi-byte
    character encoding and they must not be converted.  It is better to
    convert <newline> only in all files first, then to convert again
    both <newline> and Kanji encoding in the doc directory only.
    Convf itself is also compilable on the same compiler systems to
    which MCPP is ported.  However, if files are transferred from DOS/
    Window systems to other OSs, the case distinction of file name will
    be lost when extracted on MS-DOS or Windows 95.  Transfer as an
    archive file, extract and then convert.  convf is in the following
    location.  (Unfortunately, the document is only available in
    Japanese)

        http://download.vector.co.jp/pack/dos/util/text/conv/code/
                                                        convf-1.8.lzh


     3   How to port MCPP to different compiler systems : Overview

The source of MCPP consists of four header files and eight *.c files.
The parts which are dependant on OS or compiler system are included in
the four sources configed.H, noconfig.H, system.H and system.c.  There
is also part of the library function source in C, in lib.c.  When MCPP
is compiled by any compiler system, these source files need to be
modified to match that compiler system.

There are two ways to compile MCPP.  The first way is to automatically
generate a header file named config.h and a Makefile by executing the
configure script. After generating them, just run 'make; make install'.
The header file named configed.H will be used in this way.  However, the
configure script can only be used by Unix systems and CygWIN.

Another way is to make using a makefile for each compiler system, with
the modified/edited (if required) header file by difference files.
noconfig.H will be used in this case.  Difference files and makefiles
are in the 'noconfig' directory.  Even for systems which can use the
configure script, editing header files and makefiles directly allows one
to control compilation in detail.  However, difference files are only
available for supported compiler systems.


3.1     Already supported compiler systems

In this section, I explain how to compile MCPP using the difference
files.  Please refer to the README for the configure script.

The C compiler systems I could use are the following, and MCPP has been
ported to all of these.  Therefore, it has been verified that this
source code can be compiled, and that generated preprocessors run
correctly.  In any case the CPU used is the i386 type.

    FreeBSD 4.7                     GNU C V.2.95.4, V.3.2R
    Vine Linux 2.6 (kernel 2.4.20)  GNU C V.2.95.3, V.3.2R
    CygWIN 1.3.10                   GNU C V.2.95.3
    GO32 / DJGPP V.1.12-M4          GNU C V.2.7.1
    WIN32                           LCC-WIN32 V.3.2
    WIN32                           Visual C++ .net 2003
    WIN32, MS-DOS                   Borland C++ V.4.02J, V.5.5J
    MS-DOS                          LSI C-86 V.3.3 Trial Version
    Plan 9 ed.4                     pcc

Although the following was supported till MCPP V.2.2, I had stopped
using them any longer.  In V2.3, the configurations are left in the
source and documents. However, V2.4. does not have these either. .

    MS-DOS                          Turbo C V.2.0
    OS-9/6x09 level 2               Microware C

Configurations are quite easy for creating preprocessors of Standard C
mode source for these compiler systems.  One only needs to change some
macro definitions within the source in noconfig.H and system.H.  There
is no need to change the system.c file.

*.dif files in noconfig directory are difference files for modifying
noconfig.H and system.H, for FreeBSD4.7 / GNU C 2.95.4, to use with each
compiler system.

Visual C++.net 2003 for example, in the src directory, entering the
following command modifies these files.

    patch -c < ..\noconfig\vc2003.dif

Patch is a standard UNIX command, and this has been ported to DJGPP and
MS-DOS.  This is not the same patch which was standard in MS-DOS 5.0.

Modifications to match their own systems, such as specifying include
directory have to be done by users, apart from the modifications made by
difference file.

Makefiles for each compiler system which are to compile these modified
sources, are also attached.  (See Sec.3.7)

Enter the following command to copy into the src directory:

    copy ..\noconfig\visualc.mak Makefile

The following operations are also done in the src directory.  These are
all modifications of noconfig.H unless it is otherwise mentioned.


3.1.1       Common Configurations

For any of the following compiler systems, the setup for the default
include directory should be written in the macro called C_INCLUDE_DIR1
and C_INCLUDE_DIR2.  If the include directory of C++ is different from
the one in C, this should be written into CPLUS_INCLUDE_DIR1 and
CPLUS_INCLUDE_DIR2.  (These can be specified by environment variables or
the -I option at the time of execution.)

Include directories are also setup in system.c.  In UNIX terms, those
setup by system.c are OS-specific (usually /usr/include) and site
specific (usually /usr/local/include).  As for DOS/Windows systems,
nothing is set up for include directories in system.c.

If required, one must also change built-in macro names such as CPU_STD1
or CPU_STD2.

The default settings of multi-byte character encodings are set to EUC-JP
on Unix, shift-JIS on DOS/Windows and UTF-8 on Plan9.  If required,
modify the macro called MBCHAR to change the encoding.  In the systems
of 32-bits or more, the change of multi-byte character encoding can be
done also by the environment variables/options/#pragma.

On certain systems, because they do not support encodings such as shift-
JIS or Big5, the tokenization gets errors when there is the same value
byte of 0x5c as '\\' within multi-byte characters.  For these systems,
MCPP needs special settings to compensate for a defect of compiler.
Please refer to sec 4.1.1.5. for this configuration.

With regard to the attached makefiles, you may need to rewrite the
setting of the directory where the binary files of the compiler system
are kept.


3.1.2       FreeBSD / GNU C V.2.*

The source is to be compiled by GNUC V.2.95.4 on FreeBSD 4.7 and to be
generated cpp for FreeBSD 4.7.  Just complete by compiling as is.

For the other version of GNU C V.2.*, modify the version number of the
following places in system.H.

#define VERSION_MSG
    "GNU C compiled by GNU C 2.95"

and

#define COMPILER_EX2_VAL    "95"

For the latter, write minor version number of GNU C.  If the version is
not 4.*, then change the following values.

#define SYSTEM_EXT_VAL  "4" /* V.2.*: 2, V.3.*: 3, V.4.*: 4, V.5.*: 5 */

Furthermore, depending on versions of FreeBSD and GNU C, you need to
change the following directory.

#define C_INCLUDE_DIR1
    "/usr/lib/gcc-lib/i386-unknown-freebsd4.7/2.95.4/include"

If the version is before V.2.6.*, it needs to be changed further as
follows.
The following parts within the block begin with

#if     COMPILER_FAMILY == GNUC

change

#define HAVE_DIGRAPHS       TRUE
       to
#define HAVE_DIGRAPHS       FALSE

and

#define STDC_VERSION        199409L
       to
#define STDC_VERSION        0L

Even for other UNIXes, if the compiler system is GNU C, I suspect one
only needs to change things like these version numbers, the setup of
include directories or OS specific built-in macros.  (See sec 4.1.1)

Depending on configurations at the time when GNU C generated the
executable of GNU C itself, some cannot process multi-byte character
encoding on shift-JIS or ISO-2022-JP.  If this processing cannot be done,
MCPP can process this instead.  Please refer to sec 4.1.1.5 regarding
this.


3.1.3       Linux / GNU C V.2.*

There seem to be lots of package distributions in Linux, but the
attached linux_gcc2953.dif is for porting on Vine Linux 2.6 (x86).  This
distribution uses kernel 2.4.20 and GNU C V.2.95.3, glibc 2.2.4.

The specification of getopt() of glibc is different from the standard
ones such as POSIX, please use the one in lib.c instead.  I assume glibc
is used by default for noconfig.H in Linux.

For the other versions or distributions of Linux, I think macros such as
VERSION_MSG, C_INCLUDE_DIR?, CPLUS_INCLUDE_DIR? and COMPILER_EX2_VAL
need to be rewritten after the application of linux_gcc2953.dif. (the
modification of makefiles is also required.)

In addition, the setup of environment variables is required at execution
time.  Please refer to manual.txt.


3.1.4       FreeBSD, Linux / GNU C V.3.*

To change the setup for GNU C V.2.* to GNU C V.3.2, the following
several macros need to be modified.

#define VERSION_MSG     \
    "GNU C compiled by GNU C v.3.2\n"

#define COMPILER_EXT_VAL    "3"
#define COMPILER_EX2_VAL    "2"
#define COMPILER_CPLUS_VAL  "3"

#define HAVE_INTMAX_T       TRUE
#define STDC_VERSION        0L

In Linux, change the following macros as well.

#define CPLUS_INCLUDE_DIR1  "/usr/include/g++-3"

Also, if another version is installed instead of the system standard GNU
C, it should create another include directory for the specific version.
Specify the particular directory using the following macro.

#define CPLUS_INCLUDE_DIR2  \
  "/usr/local/gcc-3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/include/c++/3.2"
#define C_INCLUDE_DIR1      \
  "/usr/local/gcc-3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/include"
#define C_INCLUDE_DIR2      "/usr/local/gcc-3.2/include"

For other versions of GNU C V.3.x, match the value of COMPILER_EX2_VAL
to x.  Attached linux_gcc32.dif is the difference file for modifying the
source for FreeBSD/GNU C V.2.95.4 to Linux/GNU C V.3.2.

In GNU C V.3, the preprocessor is absorbed within the compiler (ccl,
cclplus).  So, to use MCPP, replace the call of gcc, g++ with shell-
script and need to set to execute in order of cpp0 => cc1, cpp0 =>
cc1plus.  For the way of doing this, please see manual.txt [3.9.7].


3.1.5       CygWIN V.1.* / GNU C 2.*

For CygWIN V.1.3.10 / GNU C V.2.95.3, add the changes in cyg1310.dif to
noconfig.H.

For other versions, it should be able to be ported by modifying macros
such as VERSION_MSG, C_INCLUDE_DIR1, CPLUS_INCLUDE_DIR1,
COMPILER_EXT_VAL, COMPILER_EX2_VAL, COMPILER_SP3_VAL.  For GNU C 3.*,
please refer to section 3.1.4.


3.1.6       DJGPP V.1.*

For DJGPP V.1.12 / GNU C  V.2.7.*, modify noconfig.H as per djg112m4.dif.

For GNU C of the other versions of V.2.*, change the version number of
macro VERSION_MSG and COMPILER_EX2_VAL.  For the versions up to V.2.6.*,
HAVE_DIGRAPHS is set to FALSE and STDC_VERSION is set to 0L.  For the
versions of DJGPP  V.2.*, change the value of the macro
COMPILER_STD2_VAL  from "1" to "2".

If the version does not support Shift-JIS,
Change

#define SJIS_IS_ESCAPE_FREE   TRUE
       to
#define SJIS_IS_ESCAPE_FREE   FALSE

In the porting to DJGPP, macros of system.H, C_INCLUDE_DIR? and
CPLUS_INCLUDE_DIR? are not setup.  Instead, by setting environment
variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH, the include directory
is set at execution time.  It does not matter which way you do it, just
set it up the way you like.  (details in Sec 4.1.1.1)


3.1.7       LCC-WIN32 V.3.*

In LCC-WIN32 V.3.2, it needs to be changed as per lcc32.dif.  In other
versions, the VERSION_MSG macro needs to be modified.

long long of LCC-WIN32 had a bug and was not usable before, but it is
working, at least, in V.3.2 (Aug, 2003).


3.1.8       Visual C++ .net

In Visual C++ .net 2003, it needs modifications as vc2003.dif.  For
other versions of Visual C, besides modifying VERSION_MSG macro, the
values of built-in macro, _MSC_VER, should be changed by modifying the
setup of COMPILER_EXT_VAL.  And also, the values of built-in macro,
_M_IX86, should be changed by modifying the setup of COMPILER_SP2 VAL.
Though preprocessors of Visual C in the later version of V.6.0. seem to
be built into compilers, c1.dll and c1xx.dll, the macro ONE_PASS should
be set to FALSE for other versions with separated preprocessor. (Refer
sec 4.1.1.3)


3.1.9       Borland C++ V.4.*, V.5.*

In Borland C V.5.5/bcc32, it needs to be changed with bc55_w32.dif.
In Borland C V.4.0/bcc/large memory model, it needs to be changed with
bc40_16l.dif.

In other versions of Borland C, C++ or Turbo C++, besides the
VERSION_MSG macro, the values of built-in macros, __TURBOC__,
__BORLANDC__ and __BCPLUSPLUS__ should be changed by modifying macros
COMPILER_STD2_VAL, COMPILER_EXT_VAL and COMPILER_CPLUS_VAL, in noconfig.
H.  (Refer Sec 4.1.1.1)  If the version with digraphs and
__STDC_VERSION__, the setup of HAVE_DIGRAPHS and STDC_VERSION needs to
be changed.  (Refer Sec 3.1.1 and 3.1.2)

For the versions till Borland C 4.*,
do
#define SEARCH_INIT         CURRENT

In the case of MS-DOS version (bcc), not WIN32 version (bcc32),
do not
#define SYSTEM              SYS_WIN32
but do
#define SYSTEM              SYS_MSDOS


3.1.10      LSI C-86 V.3.3

To port to LSI C-86 V.3.3, change by lsic33.dif and compile with LSI C-
86.  Please specify stack size to be more than 8KB.  (lcc -k '-s 2000')


3.1.10.1        Compile MCPP for LSI C86 by Borland C

The compiler system to target for porting and the compiler system to
compile MCPP do not need to be the same, for example, it is no problem
to compile MCPP for Borland C by LSI C-86.  In this way, the code size
becomes compact.  Conversely, MCPP for LSI C-86 can be compiled by
Borland C.  Though LSI C-86 Trial Version can compile with only the
small memory model, translation limits of MCPP can be made bigger by
other compiler systems with large model or WIN32.

To compile MCPP for LSI C-86 V.3.3 by Borland C V.5.5 (bcc32), noconfig.
H should be:

#define SYSTEM          SYS_MSDOS
#define COMPILER        LSIC
#define HOST_SYSTEM         SYS_WIN32
#define HOST_COMPILER       BORLANDC
#define VERSION_MSG     \
    "LSI C-86 compiled by Borland C V.5.5"

PART2 of noconfig.H should be the setup for Borland C (host-compiler).

The attached lsi_bc55.dif is the difference file to compile MCPP for LSI C-86 by Borland C V.5.5.

On the other hand, when compiling MCPP for Borland C by LSI C-86 V.3.3, noconfig.H should look like:

#define SYSTEM              SYS_WIN32
#define COMPILER            BORLANDC
#define HOST_SYSTEM         SYS_MSDOS
#define HOST_COMPILER       LSIC
#define VERSION_MSG     \
    "Borland C compiled by LSI C-86 V.3.3"

#define STDC_VERSION   0L

PART 2 should be set for LSI C (host-compiler).  At the time of
compiling, add option -D__BORLANDC__.


3.1.11    Plan 9 ed.4 / pcc

In Plan 9, compile by adding some changes as plan9r4.dif.  The format of
plan9r4.dif is different from other *.dif.  There is diff in Plan 9, but
not patch.  The difference by diff -e, not diff -c, is written in
plan9r4.dif, and the following line with one character is added.

    w

Therefore, to modify the source using this, run

    ed noconfig.H < ../noconfig/plan9r4.dif

The compile on Plan 9 is quite different from other systems.  Please see
section 3.7.1 regarding this.


3.2     Compiler systems to which the DECUS cpp had been ported

The DECUS cpp seems to have supported RT-11/DECUS C and RSX/DECUS C on
PDP-11, VMS/VAX-11C, PDP-11/UNIX and VAX/ULTRIX - some kind of C (pcc?)
on VAX.  It also seemed to have supported a quite old version of
Microsoft C and Lattice C on MS-DOS.  I removed these, as I suppose it
is no longer required and I can no longer maintain them.


3.3     noconfig.H, configed.H, system.H

system.H includes configed.H when the macro HAVE_CONFIG_H is defined to
1, otherwise it includes noconfig.H.   PART 1 and PART2 are in configed.
H and noconfig.H, and PART 3 is in system.H.

In these files, some macros which are required to port to each compiler
system are defined.  When porting to compiler systems which have not
been ported to yet, one needs to add from a few lines to a dozen lines
in Part 1.  Also, to generate MCPP for the modes that are not Standard C,
a few macros in Part 3 need to be re-written.

Part 1 is the definition dependent on OS and target compilers, Part 2 is
the definition dependant on host systems, and Part 3 is the definition
of the MCPP behavior specification.

When porting with different configurations from the default, please make
sure to look through this files thoroughly.


3.4     system.c, mbchar.c

system.c absorbs the discrepancies of OS's or compiler which cannot be
absorbed solely by configed.H (noconfig.H) or system.H macros.  To port
to a new compiler system, adding from a few lines to tens of lines of
source into this file may be required.

This file includes items such as options for MCPP invocation, usage
output, include directory, the handling of OS unique directory paths
when opening header files or source files, processing of #pragma, and
processing of compiler system unique extension directives.  Most of them
are setup for the target OS and target systems.

From V.2.4-release, the source file called mbchar.c is added.  This has
the multi-byte character processing routines and the table for character
types.  When implementing the unimplemented multi-byte character
encodings, add the processing routines and character type table in this
source file.


3.5     lib.c

Within library functions, C source code for getopt() and stpcpy(), which
are not in Standard, are written in this file.  Also, C source code for
memmove(), memcmp(), memcpy(), strstr() and strcspn() are provided, as
some old compiler systems do not support them, even though they are in
Standard. (Actually, for the CPUs with the so called block-transfer-
instruction, you should rewrite these five macros and stpcpy() in
assembler.)

There are some compiler systems which do not have memmove(), but memcpy()
contains the functionality of memmove().  (The old BSD types seem to be
so.)  In that case, you only need to add

    #define memmove(dst,src,size) memcpy(dst,src,size)

in noconfig.H, or strings.h of the system.

I also include C source code for fgets(), though this is available in
any compiler system, but its detail is sometimes different from Standard.
It should also not be a problem to use those included with the compiler
system either.  In Standard, nothing should be written in the buffer
when it becomes EOF without reading a single byte (therefore, the last
line would be preserved in the buffer), however, there are some systems
which write '\0' (i.e. LSIC, MWC09).  In that kind of fgets(), only the
diagnostic messages where the source file finishes at <backslash>
<newline> or without <newline>, goes wrong.

Any other library functions, other than fgets(), do not depend on the
specification difference on different compiler systems, and these will
not have a problem for any compiler systems unless there is a bug.

The functions used are very common ones.  For example, the evaluation of
an integer constant for the #if expression uses my own code without
using strtoul() nor atoll().  This is because strtoul() is not normally
available in older systems, and atoll() cannot do out of range checks.

To use the function called xyz in lib.c, the macro HOST_HAVE_XYZ, in
PART 2 of noconfig.H (configed.H), should be defined to FALSE.  The
target compiler system is assumed to be the same as the host, but PART 2
needs to be modified when it is different.


3.6     Standard headers

In the source code of MCPP, stdio.h, ctype.h and errno.h are included
unconditionally.  There should not be a compiler system which does not
have these.  These header files are not necessarily required to follow
the C Standard rules.  Though stdlib.h, string.h, stddef.h and time.h
are specified in the Standard, these are not available in the old type
of systems.  In that case, all the necessary function declarations or
macro definitions are written in PART2 of noconfig.H (configed.H) to be
used immediately.  The case of same functions with different names is
also taken into account.

These sections are enclosed by #if1 - #else - #endif, #if0 - #else - #
endif, so please reverse 1 and 0 if required.


3.7     Makefile and recompile using MCPP

*.mak are the makefiles for individual compiler systems, and a detailed
setup is possible.  make itself is assumed to the one which is attached
to each compiler system or the standard for the system.  In Visual C,
nmake should be used instead of make.

Except for FreeBSD/GNU C, modify the noconfig.H and system.H as follows:
(Assume the system is xyz)

    patch -c < ../noconfig/xyz.dif

Then, using an editor, edit macros such as C_INCLUDE_DIR? in noconfig.H
to suit your own system.  After copying the required noconfig/xyz.mak to
Makefile, and setting up the target directory to match your system, run
as

    make
    make install
    make clean

In lsic33.mak, the setup for compiling MCPP for BorlandC by LSIC-86 is
written in the comments.

Also, in borlandc.mak, the setup for compiling MCPP for LSIC-86 by
Borland C, is written in the comments.

For other compiler systems, please write the necessary makefile
referring to these files.  The dependencies of the source files are
simple, and the relationships are

    main.c,control.c,eval.c,expand.c,support.c,system.c, mbchar.c
        depend on system.H, internal.H
    lib.c depends on system.H
    system.H depends on configed.H or noconfig.H

    system.H needs to be included before internal.H.

The stack size is the size the system uses, with NMACWORK + (NEXP  * 30)
+ (sizeof(int) * 100) needing to be added.  Furthermore, for MODE >=
STANDARD, (sizeof(char *) * 12 * RESCAN_LIMIT) needs to be added.
(NMACWORK, NEXP, RESCAN_LIMIT are macros defined in system.H)

In systems like MS-DOS, the shell (command processor) does not expand
wild cards, and it is safe to be compiled not to expand in MCPP either.
(Unless the -o option is specified, the second argument will be taken as
the specification of the output file.)

To recompile MCPP using MCPP itself, place the executable program into
the location where the cpp of the compiler system should be.  For
instance, in the cases of FreeBSD, Linux or CygWIN, rename the cpp0
attached to the compiler system to something like cpp0_gnuc and name
MCPP to e.g cpp_std.  For the preprocessors of other modes, name each of
them to, for example, cpp_poststd, cpp_prestd and cpp_old, and you
should link it to whichever cpp you use at the time.  Therefore, if
cpp_std is the preprocessor you are going to use, you need to do

    ln -sf cpp_std cpp0

For DOS/Windows, you need to copy the one you are going to use, to cpp.
exe. * In this case, if you run

    make NAME=cpp_std

from the beginning, you do not need to rename later.  (The same thing
needs to be done in TC, and BC make requires make -DNAME=cpp_std.  For
UCB make, -D can be either added or not.  For GNU make, -D should not be
added.)

Using the attached makefiles, make install does not do any detailed
processing.  Except for freebsd.mak, linux.mak and cygwin.mak, please
make up manually.  Please copy the preprocessor attached with the
compiler system, into the other name beforehand, so as to prevent being
deleted by make install.

When you recompile MCPP using the one path compiler system such as
Visual C or Borland C, you should put the output file of MCPP to be the
source file to be supplied to the compiler.  (For instance, output the
preprocessed source file, main.c as main.i, and compile that using cl or
bcc32.)

When recompiling using MCPP, if the "pre-preprocess" functionality
within the header file is used, the preprocess time will be reduced
dramatically.  When you use the attached makefile, for UCB make, GNU
make or MS nmake, you run

    make PREPROCESSED=1

for BC make, you run

    make -DPREPROCESSED=1

which automatically pre-preprocesses the header file, then compiles.
For LCC-Win32's make or LSI C-86 trial version's make, distinguishing
the situation by if statement is not possible, so you need to modify the
makefile and recompile.  The details of the modification are written
into the makefile itself.

In BSD make, GNU make or MS nmake, if you run make with the option
MALLOC=KMMALLOC, this links to the malloc() which I wrote.  About this,
please refer to [4.extra].  For BC make, the same thing can be done by
the option -DKMMALLOC.  To link my malloc() with the make of LCC-Win32
or LSI C trial version, you need to modify the makefile.

  * In FreeBSD, the directory of cpp should be located is /usr/libexec.
    See manual.txt Sec 2.1.
    In Vine Linux 2.6 (i386), it should be located in the really deep
    end directory  /usr/lib/gcc-lib/i386-redhat-linux/2.95.3.  In Linux/
    GNU C, according to the distribution or the version, this directory
    setting in the makefile needs to be modified.  There are various
    different include directories, for which you may need to set up the
    environment variable C_INCLUDE_PATH?.
    Also, in GNU C 2.95 or 3.2, there is cpp0 besides cpp, and cpp0 is
    called by gcc.
    For further information, see manual.txt Sec 3.9.5 and 3.9.7.  In GNU
    C V.3, the preprocessor is absorbed in the compiler (ccl, cclplus),
    so the call of gcc, g++ needs to be replaced in the shell-script if
    you want to use MCPP.


3.7.1.  Compile by Plan 9 / pcc

In Plan 9, there are the preprocessor written by Dennis Ritchie and the
compiler by Ken Thompson.  The compiler has a built-in preprocess
functionality.  (However, #if is not implemented.)

There are two compiler drivers - cc and pcc.  cc calls the compiler
without calling the preprocessor.  pcc, portable cc for ANSI & POSIX
environment, calls the preprocessor, and then calls the compiler.
Include directories are also different with these two.

Since the source code of MCPP uses #if, it has to use pcc to compile.
The following explanation is the way for MCPP to be compiled by pcc and
to be used as a replacement of Ritchie's cpp.

Also, in Plan 9, make is called as mk for a command, so the naming
makefile to mkfile is a usual practice.  The format of mkfile is very
different from other makes and unique to Plan 9.  Also, if and else
cannot be used, so the mkfile needs to be rewritten when you want to
change the compiling method.

To compile MCPP, add changes to nonconfig.H in MCPP's src directory as
described in sec 3.1.11, then do

    cp ../noconfig/plan9.mk mkfile
    mk
    mk install
    mk clean

This copies the executable called cpp_std to /$objtype/bin/ directory,
so do

    cp /$objtype/bin/cpp /$objtype/bin/cpp_ritchie

to save Ritchie cpp, then do

    cp /$objtype/bin/cpp_std /$objtype/bin/cpp

The environment variables $objtype becomes 386 when the CPU is x86, and
you only need to type in 386.

To recompile MCPP using the "pre-preprocess" functionality of MCPP,
replace the following line in mkfile

    PREPROCESSED=0

to

    PREPROCESSED=1

then, do

    mk preprocessed

After the 'mk install', it is the same as above.


3.8     Compiler systems which can compile MCPP

Though some configuration is required to port to each compiler system,
compiling MCPP's source code can be done by any compiler system which
satisfies K&R 1st specification.  To be exact, if it satisfies the
common spec of K&R 1st and Standard C, it will be fine.

MCPP can be compiled by C++ too. (Whether C++ is used is decided by #
ifdef __cplusplus.)  Compile with the next steps.

  1. Rename all *.c, except lib.c, to *.cc or *.cpp.
  2. Run 'make'.  When "pre-preprocessing" by using MCPP, add -+ option.
    Invoke make using attached *.mak with an option of CPLUS=1 or
    -DCPLUS=1 depending on the compiler system.  However, there is no
    merit in compiling with C++.

The char type can be either signed or unsigned.

Floating point operation is not necessary.

Preprocessor is the same, or it is even possible by the so-called Reiser
model cpp. (It is close to OLD_PREPROCESSOR mode of MCPP.)  Even one for
which only constants of int can be used for #if expressions or one with
poor evaluation, should be ok.  When recompiling by using MCPP itself,
it does not matter if MCPP's execution program is generated by any setup
mode of system.H.  (Except the case of the standard header pre-requisite
of the specification which MCPP does not have).  In DECUS cpp, there was
even source code to support the preprocessor which cannot use macros
with arguments, but obviously I have removed this.

This source code is written so as not be affected by the minor
discrepancies of the compiler systems.  The newly added features of
Standard C have not been used.  I had to adopt this approach to write
the preprocessor's source portably, in order to support Standard C
completely. *

Of course, it is necessary to avoid the compiler system's own bugs to
actually compile with each compiler.  This cannot be found out until it
has to be done.  When I was porting on some compiler systems, there were
a few cases which took me a long time to trace the bug and to find the
work around.

  * However, the concatenation of string literals, which is the Standard
    C specification, is used for the description of the length modifier
    for long long in printf().  I had to take this method as the length
    modifier is different per compiler system.  There should not be the
    case in which the concatenation of string literals cannot be done
    for the compiler system with long long.


3.9     Host compiler system and target compiler system

There is no need for the compiler system which compiles the MCPP source
code (host) and the compiler system which will use the generated MCPP
execution module (target) to be the same.  If these are different,
select the target by SYSTEM and COMPILER and the host by HOST_SYSTEM and
HOST_COMPILER within noconfig.H (configed.H).  Also, the definitions in
PART1 are the settings for the target, and the ones in PART2 are for the
host.  system.c is mainly for the target.  lib.c should be compiled
using the settings for the host.

However, there are the following limitations.

  1. The host compiler system should be the same OS as the target
    compiler system, otherwise a cross-compiler has to be used.
  2. long (unsigned long) in the host compiler system has to be an equal
    or wider range than the one in the target compiler system.  This is
    also the condition defined in the Standard.  The same thing can be
    said for the long long (unsigned long long) in C99.

By the way, the host and the target stated here are nothing to do with
the ones in the cross-compiler.  Cross-compiling is the job of the
compiler itself, and in principle the preprocessor is not concerned
about that.  When MCPP is ported to a "cross-compiler", this cross-
compiler is the target compiler system in here.  As for the host
compiler, you need to use the one which is not the cross-compiler.  When
MCPP is compiled by a "cross-compiler", the cross-compiler is the host
compiler system, and the target of the cross-compiler becomes the target
compiler system.

When making the cross-compiler type MCPP, for programs for DOS/Windows,
which operates on UNIX, the MBCHAR in PART1 of noconfig.H is set to SJIS,
not EUC_JP.  Hence, the setting of multi-byte character encoding has to
match to the one of the target.

However, in MCPP, the character set of the host system, which compiles
MCPP, and the character set of the target system which uses MCPP, are
presumed to be ASCII.  Ditto for the character set of the host and cross-
compiler target.  If the character set is different in these two, I
think the type_*[] of mbchar.c and macros'ALERT and VT' of noconfig.H,
if required, should be set to the one of target systems and be set not
to run the concatenation of string literals (Set CONCAT_STRINGS to FALSE)
(as long as the compiler of the cross-compiler system can change all the
characters within the string literal and character constant, except
numeric escape sequences, and concatenate the adjoining string literals).
I am not certain about these, as I have never used character sets other
than ASCII or any cross-compiler system.


3.10    Unsupported compiler systems

The compiler systems which MCPP does not support are those with special
character sets or special CPU.

EBCDIC is not supported either.  I think EBCDIC can be used by just
modifying the character type table called type_*[] in mbchar.c, but I do
not have the environment to test it.

The CPUs for which integer operation is not two
's complement are also not supported.  If it is not two's complement, it
may run incorrectly when an overflow has occurred at a #if expression.


3.11    Memory model of MS-DOS

When MCPP is ported by a compiler system on MS-DOS, an adequate memory
model needs to be chosen, depending on the purpose.  This is because the
translation limits, the memory consumption and the speed are different
depending on the memory model.

The small data memory model can be used for processing small programs.
However, if it is required to compile the special or huge source code,
such as with over several hundreds of macro definitions, with the
enormous size of over 1KB of macro definitions, or calling a nested
macro definition of enormous length, please compile by the large data
model.  (In this case, the speed will be 50-60 percent slower than the
MCPP compiled by the small data model. )

The large code model is required for compiling in order to use the full
functions of MCPP.  In case of the small model, such as LSI C-86 trial
version, some parts of functions should not be installed by #defining,
such as setting OK_MAKE to FALSE in system.H.


      4   How to port MCPP to different compiler systems : Details

4.1     Configuration of noconfig.H, configed.H, system.H

I think you should be able to understand most of what is written in
these header files if you read them.  I include lots of comments as well.
In case, I write the following note.

noconfig.H (configed.H) contains PART 1 and PART2, and PART 3 is in
system.H.

Firstly, select the target system (the system for which MCPP is to be
built) and the host system (the system which compiles MCPP.).

SYSTEM
    Select the OS which the target compiler will be operated on.  The
    name of the OS is defined right after this.  Define accordingly for
    the OS which is not defined.
COMPILER
    Select the target system.  The name of the compiler is defined right
    after this.  Define accordingly for the compiler systems which are
    not defined.
VERSION_MSG
    Write the version information as a string literal to be displayed in
    -v option or usage() sentence.
HOST_SYSTEM,
    Select the host OS and the host compiler system.  If these are the
    same as the target, set as

        #define HOST_SYSTEM     SYSTEM
        #define HOST_COMPILER   COMPILER

Though there is a certain naming convention for SYSTEM and COMPILER, it
is easier to see the source code.  Though this is overstating it a bit,
SYSTEM is only used for the type of path list of include files or to
know the standard include directory of the OS, so one does not need to
be concerned with it too much.


4.1.1       PART 1 Configuration of Target system

4.1.1.1     Predefined macros

CPU_OLD, CPU_STD1, CPU_STD2, SYSTEM_OLD, SYSTEM_STD1, SYSTEM_STD2,
    SYSTEM_EXT, SYSTEM_EX2, COMPILER_OLD, COMPILER_STD1, COMPILER_STD2,
    COMPILER_EXT, COMPILER_EX2
    Specify the unique macro name of the compiler system, which will be
    pre-defined in MCPP, in a string literal.  Leave undefined any
    unnecessary ones (should not define to 0 token). *_OLD generate old
    style macros which do not begin with '_' (underscore), these won't
    be pre-defined at MCPP execution time if more than 1 is specified
    for <n> of the -S<n> option.  In *_STD?, *_EXT and *_EX2, always
    specify the macro name beginning with '_'.  *_STD1 starts from __,
    and *_STD2 starts from __ and end with __.  In SYSTEM_EXT,
    SYSTEM_EX2, COMPILER_STD1, COMPILER_STD2, COMPILER_EXT and
    COMPILER_EX2, the value of their macros are also specified by
    SYSTEM_EXT_VAL, SYSTEM_EX2_VAL, COMPILER_STD1_VAL, COMPILER_STD2_VAL,
    COMPILER_EXT_VAL and COMPILER_EX2_VAL.  This is selected by a string
    literal which is the integer enclosed by "".  The macro that expands
    to a 0 token is defined as "".  If nothing is specified, the value
    of the macro becomes 1.  All other predefined macros (the ones
    specified by CPU_*, SYSTEM_OLD, SYSTEM_STD1, SYSTEM_STD2,
    COMPILER_OLD) have a value of 1.
CPU_SP_OLD, CPU_SP_STD
    Write the compiler system unique special predefined macro name as a
    string literal. All values should be 1.
SYSTEM_SP_OLD, SYSTEM_SP_STD
    Write the compiler system unique special predefined macro name as a
    string literal, and define the values at SYSTEM_SP_OLD_VAL and
    SYSTEM_SP_STD_VAL.
COMPILER_SP1, COMPILER_SP2, COMPILER_SP3
    Write the compiler system unique special predefined macro name as a
    string literal, and define the values at COMPILER_SP1_VAL,
    COMPILER_SP2_VAL and COMPILER_SP3_VAL.
COMPILER_CPLUS, COMPILER_CPLUS_VAL
    Specify the name and the value of the compiler system's unique
    predefined macro, which is defined when -+ option (C++ preprocess)
    is specified by the string literal as above.  If COMPILER_CPLUS_VAL
    is not specified, the macro value becomes 1.  The name has to begin
    with '_'.  If not required, leave COMPILER_CPLUS itself undefined.

All the above macros become disabled by the -N option.

4.1.1.2     Include directories and others

C_INCLUDE_DIR1, C_INCLUDE_DIR2, CPLUS_INCLUDE_DIR1, CPLUS_INCLUDE_DIR2
    Specify the include directory of the standard header file searched
    by MCPP.  CPLUS_INCLUDE_DIR? is setup when the include directory of
    C++ is different from that of C. (When invoking MCPP, this is
    enabled by the -+ option.)  As /usr/include, /usr/local/include in
    UNIX, are set in system.c, compiler-system specific directories
    should be set in C_INCLUDE_DIR?.
ENV_C_INCLUDE_DIR, ENV_CPLUS_INCLUDE_DIR
    Define the environment variable name, for when the include directory
    for the standard header file searched by MCPP is specified by the
    environment variables at execution time.

ENV_CPLUS_INCLUDE_DIR is the name of the environment variable which
specifies the include directory of C++.  Each of them is defined as
"INCLUDE", "CPLUS_INCLUDE" as a default.  When implementing in GNU C,
"C_INCLUDE_PATH" and "CPLUS_INCLUDE_PATH" are defaults.

Other search paths are the ones setup in system.c and those setup by the
-I option. (About the priority of these, see manual.txt /[4.2])

ENV_SEP
    When writing multiple paths in this environment variable, write
    separators in the literal constant.  This is ':' of /usr/local/abc/
    include:/usr/local/xyz/include or ';'of C: DJ112/INCLUDE;C:DJ112/
    LOCAL/INCLUDE.

4.1.1.3     The output specifications of line number information
                    and others

LINE_PREFIX
    Specify the format for passing the file name and the line number
    information from MCPP to the compiler-proper.
        #line 123 "fname"
    The format of the above Standard C source code is set as default.
    Write an alternative sequence into the string literal to replace
    this "#line " for compilers which use other formats.
        #123 "fname"
    If the above is the format, define "# ".  If it is a unique format,
    which is not any of the above, define the format to match.  (In some
    cases, these may need to be added to sharp() or other functions in
    system.c)
    In some compiler systems like GNU C, there are cases where the
    output of the attached preprocessor is not the first format, but the
    compiler-proper recognizes the first format as well.  Generally, the
    first one is preferred, because that is more standard and the same
    format as C source code.  If the compiler-proper recognizes this,
    even one preprocessor on the same OS, with different compiler system,
    may be fine.
    However, it is actually not that easy because of an issue of the
    different invoking options.  Also, since some tools of GNU depend on
    the GNU C/cpp unique output format, it is better to select the
    format in GNU C.  (For GNU C, this is set as a default.)
    When MCPP is used in the front end of a one path compiler, such as
    Visual C++ .net or Borland C, the output of MCPP has to be the
    Standard C source code to be able to pass the output to the built-in
    preprocessor.  Hence, the transfer of the line number has to be the
    first format.

EMFILE
    If EMFILE is not the macro for the value of errno, which means
    "too many open files (for the process)" in <errno.h>, define EMFILE
    into the macro name (Of course, you can add to <errno.h> itself).

ONE_PASS
    If the target compiler is the so-called one-path-compiler in which
    the preprocessor is not separated, then set this to TRUE, otherwise
    set this to FALSE.  If this is set to TRUE, all the predefined
    macros of the compiler system in #pragma __put_defines (#put_defines)
    will be output enclosed by comments.  This is to prevent duplicate
    definitions, as it will be preprocessed again if the output of MCPP
    is passed onto this.
    However, though GNU C V.3 can be called a one-path compiler, this
    macro should be set to FALSE as the independent preprocessor can
    also be used.

FNAME_FOLD
    Define this as TRUE for the OS which cannot distinguish upper and
    lower case in file names, otherwise set this to FALSE.

FOLD_CASE
    When specifying the option from the command line, set this to FALSE
    for the OS which passes the command to MCPP with case sensitive.
    Set this to TRUE for the OS which passes it as non-case sensitive.
    (I do not currently know which OSs are the latter.)

4.1.1.4  Configurations corresponding to the system's language
                specifications

HAVE_PRAGMA
    If the compiler-proper which can recognize #pragma, then set this to
    TRUE, otherwise set to FALSE.
EXPAND_PRAGMA
    Set this to TRUE for the compiler-system which expand macro unless
    STDC is the argument of #pragma line when invoked by -S1 -V199901L
    in MODE >= STANDARD.  This is set to FALSE in default.  In Visual C,
    set this to TRUE as the argument of #pragma line is always subject
    to macro expansion.  However, MCPP, even for Visual C, is only
    enabled for macro expansion when the argument of the #pragma line
    does not start from STDC.

HAVE_DIGRAPHS
    Set this to TRUE when Digraphs processing is implemented, otherwise
    set this to FALSE.

CAN_CONCAT_STRINGS
    Set this to TRUE for systems in which adjoining string literals are
    concatenated by compilers.

STDC
    This defines the default value of the predefined macro __STDC__.  If
    __STDC__ is not defined, set this to 0.
STDC_VERSION
    This defines the default value of the predefined macro
    __STDC_VERSION__ for the target system.  If __STDC_VERSION__ is not
    defined, set to  0L.

CHARBIT, UCHARMAX, LONGMAX, ULONGMAX
    Write values of CHAR_BIT, UCHAR_MAX, LONG_MAX, ULONG_MAX in <limits.
    h> of the target compiler system.  Use the value of LONG_MAX in
    ULONGMAX for compiler systems without unsigned long.  It is easy to
    define even without <limits.h>.

HAVE_C_BACKSLASH_A
    If the compiler proper of the target compiler system can recognize
    escape sequence '\a' and '\v', then set this to TRUE otherwise set
    to FALSE.

4.1.1.5     Multi-byte characters

The macro called MBCHAR is used to specify the type of encoding for
multi-byte characters.  With 16-bit systems, it can only use the
specified one type of encoding. With systems  of 32-bits or more, all
the following types of encodings are implemented at the same time.
MBCHAR only specifies the default encoding, that can be changed by
environment variables/ options/#pragma at execution time (Refer manual.
txt sec 2.3, 2.8, 3.4 for how to use).

MBCHAR
    Define the encoding for multi-byte characters, that is Kanji in
    Japanese, of the target.

Encodings which can be selected on both 16 bit and over 32-bit systems
are the following.

   EUC_JP  : Japanese extended UNIX code (UJIS)
   SJIS    : Japanese shift-JIS (MS-Kanji)
   GB2312  : Chinese EUC-like GB2312 (simplified-Chinese)
   BIGFIVE : Taiwanese Big Five (traditional-Chinese)
   KSC5601 : Korean EUC-like KSC-5601 (KSX 1001)

These are all encodings with a character occupying 2-bytes and without
shift-states.  Though wchar_t is a 4-byte type in some compiler systems,
despite the encoding of multi-byte characters and wide characters being
2-byte, the preprocessor is not concerned with what type of wchar_t it
is.  As multi-byte or wide characters occupy 2-bytes on source code, it
processes accordingly.

The systems  of 32-bits or more can also select the following encodings.

   ISO2022_JP :  International standard ISO-2022-JP1 Japanese
   UTF8       :  A type of encoding of Unicode, UTF-8

ISO-2002-* is the encoding with shift-states.  UTF-8 is used to encode 2-
byte Unicode to 1-byte or 3-bytes.  Kanji (Chinese characters) become 3-
bytes.

When MBCHAR is defined to 0, multi-byte character processing does not
get implemented in 16-bits systems.  However, for 32-bits or more
systems, this only makes the one, which does not require the multi-byte
character processing, to be the default, and the environment variables/
options/#pragma can change it at execution time.

SJIS_IS_ESCAPE_FREE
    Set this to TRUE when the compiler-proper processes shift-JIS.  If
    the compiler-proper does not process it, then set to FALSE.

    In Shift-JIS, there are cases where the second byte of Kanji is the
    value of 0x5c which is the same as '\\'.  If the compiler-proper
    does not recognize shift-JIS, it interprets it as an escape sequence
    and gets an error at tokenization.

    If SJIS_IS_ESCAPE_FREE is set to FALSE, MCPP processes shift-JIS.
    That is, when 0x5c is the second byte of shift-JIS Kanji within the
    string literal or character constant at the final MCPP output time,
    it adds one more 0x5c.  This tentatively makes the English version
    compiler support characters such as Shift-JIS.

BIGFIVE_IS_ESCAPE_FREE
    Same as above, set this to TRUE when the compiler-proper processes
    Big Five, and set to FALSE if not.

IS02022_JP_IS_ESCAPE_FREE
    Same as above, set this to TRUE if the compiler-proper processes ISO-
    2022-JP and set to FALSE if not.  With ISO-2022-*, there may be the
    bytes which match not only to '\\', but also to '\'' or '"'.  If
    ISO2022_JP_IS_ESCAPE_FREE is FALSE, MCPP inserts a 0x5c byte before
    all bytes matching to '\\', '\'', '"'.

By the way, the behavior of the compiler as regards multi-byte
characters may vary depending on the environment at execution time.  Set
these macros to match your environment.  Regarding this, please refer to
manual.txt sec 2.8.

4.1.1.6     Target and host system common configurations

The next four are written in PART 1 for convenience.  Set these TRUE
when both target and host systems have the nominated type, otherwise set
to FALSE.  HAVE_LDBL will only be used when OK_SIZE, which will be
explained later, is set to TRUE.

HAVE_UNSIGNED_LONG
    Set this to TRUE for the compiler system with the data type of
    unsigned long.
HAVE_LONG_LONG
    Set this to TRUE for the compiler system with the data type of long
    long.  Set this to TRUE, for compilers such as Visual C or Borland C
    5.5, which do not have long long but there are the same size data
    type  __int64 and provides length modifier to display printf().
HAVE_INTMAX_T
    The data type called intmax_t is defined, set this to TRUE.
HAVE_LONG_DOUBLE
    Set this to TRUE for the compiler system with long double.

LL_FORM
    If the target system has long long, define the length modifier by
    the string literal for displaying the maximum integer type value of
    the compiler system in printf().  This is "j" in C99.  Also, the
    length modifier of long long is "ll" (ell-ell) in C99.  In Visual C
    and Borland C 5.5, use "I64" to display the value of  __int64.


4.1.2       PART2 Configuration of Host system

In configed.H, the target system is assumed to be the same as the host
system.  If not, PART2 needs to be rewritten.

PROTO
    If the compiler proper can declare the prototype of the function,
    set this to TRUE.  Otherwise, set to FALSE.
HOST_HAVE_GETENV
    Set this to TRUE if the compiler system has a function called getenv
    (), otherwise set to FALSE.  Obviously, the target OS also has to
    have environment variables.
HOST_HAVE_GETOPT, HOST_HAVE_STPCPY, HOST_HAVE_MEMMOVE, HOST_HAVE_MEMCPY,
    HOST_HAVE_MEMCMP, HOST_HAVE_STRSTR, HOST_HAVE_STRCSPN
    getopt(), stpcpy(), memmove(), memcpy(), memcmp(), strstr(), strcspn
    () in the library of the host system -- if those above exist, define
    each of them to TRUE.  If not, define to FALSE.  For the functions
    which are set to FALSE, those in lib.c are used.

FILENAMEMAX
    The value of FILENAME_MAX of <stdio.h> of the host system.  If there
    is no FILENAME_MAX, it is fine to set this to 80 for MS-DOS and
    BUFSIZ for other OSs.

HOST_HAVE_C_BACKSLASH_A
    If the compiler proper of the host system cannot recognize escape
    sequences '\a' and '\v', set this definition to FALSE.

The rest are such as library function declarations or size_t type
definitions, but you should be able to work this out.


4.1.3       PART3 Configuration of the MCPP behavior specification

4.1.3.1         Selection of various new and old modes

MODE
    Set the specification of behavior, which is the base of the
    preprocessor, such as the macro expansion method, directives to be
    used or predefined macros to be used.  There are 3 types:
    PRE_STANDARD, STANDARD and POST_STANDARD.  Each macro has values 0,
    3, 9, respectively.  PRE_STANDARD is slightly different in
    tokenization from the rest of the modes.

PRE_STANDARD
    Preprocess specifications before C90.  K&R 1st is such.

STANDARD
    Preprocess specifications for Standard (C90, C99, C++98).

POST_STANDARD
    It is almost the same as STANDARD, but I added the following change.

    1. Does not recognize trigraphs.  Digraph is converted at
    translation phase 1, that is, the beginning of preprocessing.  Does
    not get dealt with as a token.
    2. Simplified tokenization according to complete token-base rule.
    When there is no white space, as a token separator between
    preprocessing tokens in the source code, insert a space
    automatically.  (However, this does not get inserted between macro
    name and the following "(" within macro definition).  Therefore,
    even for stringizing by # operator, it gets stringized after a space
    is inserted between all the preprocessing tokens.  Also, at the re-
    definition of macros, it does not matter whether there is a token
    separator or not.
    3. At the re-definition of function-like macros, the difference of
    the parameter name is not relevant.
    4. The evaluation of character constants within #if expressions is
    not implemented as there is not much portability (it will cause an
    error).
    5. I removed irregular "function-unlike" rules for function-like
    macro expansion.  Hence, rescanning only targets to the replacement
    list of the macro, and not the sequence after that.
    6. Normally, the header name with the format of #include <stdio.h>
    is accepted, but it gets a warning.  (by class 2 warning option) If
    the header name with the format of <stdio.h> is used in a macro, it
    can get an error at the particular instance.  I recommend to use the
    format of #include "stdio.h".
    7. The rule, a space is required between macro name and replacement
    list in macro definition, is added in C99, but this rule is not
    complied with. (A space is inserted automatically at tokenization.)
    8. UCN (universal-character-name) is not recognized.  Multi-byte
    characters in identifier are not recognized.
    9. In C++, eleven identifier-like operators are not dealt as
    operators.
    10. -a (-lang-asm, -x assembler-with-cpp) option cannot be
    implemented.

    This is a simple and clear preprocess specification which no other
    current preprocessor has, which I implemented and hope this to be a
    future version of the C Standard.  If it is good source code, it
    should process with the same result as STANDARD.
    Also, the specification in STANDARD, which is non-standard but
    implemented as a compromise to the current compiler system, is not
    available in POST_STANDARD.  (However, for GNU C, MCPP is virtually
    unusable if you cannot use #include_next, so this is implemented as
    a double compromise).

Further, there are more differences in the macro expansion methods
between MODE >=  STANDARD and MODE == PRE_STANDARD.  Roughly speaking,
this difference is the difference between Standard C and pre-Standard.
The biggest difference is the expansion of the function-like macros
(macros with arguments).  For the arguments with macros, while in MODE >
= STANDARD, it substitutes the parameter within the replacement list of
the original macro after completely expanding the arguments, in
PRE_STANDARD, it substitutes the parameter without expanding, then
expands at rescan time.

Also, in MODE >= STANDARD, macros do not do recursive expansion neither
directly nor indirectly.  If there is a recursive macro definition in
PRE_STANDARD, it becomes an error at expansion time.

Handling of \ at line end is also different by MODE.  In MODE >=
STANDARD, after processing the trigraph, the sequence of <backslash>
<newline> gets deleted before tokenization, but in MODE == PRE_STANDARD,
these only get deleted when they are within the string literals or in a
#define line.

There is a subtle difference in so-called tokenization (token parsing,
decomposition to tokens).  In MODE >= STANDARD, it tokenizes faithfully
as "token based behavior".  To put it concretely, in MODE == STANDARD, a
space will be inserted in front and after the expanded macros to prevent
the unexpected merging with the other tokens before and after.  In MODE
== POST_STANDARD, insert a space between all the preprocessing tokens
while reading the source code.  In MODE == PRE_STANDARD, traditional,
convenient and tacit tokenization and the macro expansion methods of
"character based text replacement" are left a trace.  About these,
please see cpp-test.txt Sec 1.

In MODE >= STANDARD, it handles the numeric token, called preprocessing
number, according to the specification.  In PRE_STANDARD, the numeric
tokens are the same as integer constant tokens or floating point tokens.
The suffix 'U', 'u', 'LL' and 'll' of the integer constant and the
suffixes 'F', 'f', 'L' and 'I' of floating point are not recognized as a
part of the tokens in PRE_STANDARD.

The string literals and character constants of wide characters are
recognized as single tokens only in MODE >= STANDARD.

Digraph, #error, #pragma, and _Pragma() operator can only be used in
MODE >= STANDARD.  -S <n> option (strict-ansi mode) and -+ option (the
one run as C++ preprocessor) are only used in MODE >= STANDARD.  Pre-
defined macros __STDC__, __STDC_VERSION__ are defined in MODE >=
STANDARD, and they don't get defined in PRE_STANDARD.  Also, OK_DIGRAPHS,
CONCAT_STRINGS, which will be mentioned in Sec 4.1.3.2, can also be
defined TRUE only at MODE != PRE_STANDARD.

UCN (universal character name) can be only used in MODE == STANDARD.

Trigraphs can only be implemented when MODE == STANDARD.

#if defined, #elif cannot be used in PRE_STANDARD.  Macros cannot be
used within argument of #include or #line in PRE_STANDARD.  Pre-defined
macros, __FILE__, __LINE__, __DATE__, __TIME__ cannot be defined at
PRE_STANDARD.

On the other hand, #assert, #asm (#endasm), #put_defines and #debug are
only implemented at MODE == PRE_STANDARD.  OK_IF_JUNK, OK_SIZE,
COMMENT_INVISIBLE, STRING_FORMAL and OK_UNTERM_STRING are also defined
to TRUE only at PRE_STANDARD.  OLD_PREPROCESSOR is defined to TRUE only
at PRE_STANDARD.

In OLD_PREPROCESSOR, the form of #123 is handled the same as #line 123.

The output of diagnostic messages is also slightly different between
modes.  Please see sec 5 of manual.txt for details.

Any other items, which do not have any distinct rules between K&R 1st
and Standard, besides the aforementioned, are according to the C90 rules
in MODE == PRE_STANDARD (e.g. the type of #if expression).

4.1.3.2         Specifying the details of the behavioral mode

OLD_PREPROCESSOR
    If this is set to TRUE, COMMENT_INVISIBLE, STRING_FORMAL, OK_IF_JUNK
    and OK_UNTERM_STRING become TRUE.  This is close to the cpp of old
    "Reiser" model.  (The -traditional option of GNU C/cpp is close to
    this.  However, it is still unsatisfactory.  You should see this if
    you process Part 10 of misc.t, the sample for testing attached in
    the Validation Suite.  On the other hand, -traditional can use the
    constructions of C such as #elif or defined.  These are the mixed
    specification of Reiser model and C Standard.  However, there is no
    base standard for this, not only GNU C/cpp but MCPP is not perfect.)
    OLD_PREPROCESSOR can be defined to True only at MODE == PRE_STANDARD.

The following five are defined TRUE only at MODE == PRE_STANDARD.

COMMENT_INVISIBLE
    Convert comment to 0 space, not 1 space.  However, this conversion
    is done in the output at the end.
STRING_FORMAL
    When there are string literals or character constants in the
    replacement list of the macro definition, and if any of the
    parameter names match to any part of these, that part will be
    substituted with the argument corresponding to the parameter when
    calling the macro.
OK_IF_JUNK
    If this is set to TRUE, you can write anything you like in the lines
    of #else, #endif. (One usually writes MACRO of corresponding #if
    MACRO or #ifdef MACRO.)
OK_UNTERM_STRING
    If this is set to TRUE, it stops "unterminated string literal" and
    "unterminated character constant" errors.  Hence, if there is no
    closure of the literal " or ', it assumes the close at line end.
OK_SIZE
    If this is set to TRUE, sizeof(type) can be used in #if, #elif lines.
    When compiling with GNU C, with this set to TRUE, do not set -ansi,
    -pedantic, -pedantic-errors options at least when compiling eval.c.

CPLUS
    When operating as a preprocessor by -+ option, the standard macro
    __cplusplus is predefined to this value.  This is 199711L for ISO C+
    + Standard.  This can be changed at the execution time by -V option.
OK_TRIGRAPHS
    Set to TRUE when implementing trigraphs processing, otherwise set to
    FALSE.  However, trigraphs cannot be implemented for MODE !=
    STANDARD.
TFALG_INIT
    Specify the initial state of the trigraph process when OK_TRIGRAPHS
    = TRUE.  If this is set to TRUE, trigraphs are recognized as the
    default, while they become not recognized when invoked by -3 option.
    When this is set to FALSE, it is the other way around, trigraphs are
    not recognized as the default, while they become recognized by the -
    3 option.
OK_DIGRAPHS
    Set this to TRUE when implementing digraphs processing. In the MODE
    == STANDARD, if the system is HAVE_DIGRAPHS == FALSE, MCPP converts
    digraphs to normal tokens at the end of preprocessing.  In the MODE
    == POST_STANDARD, digraphs get converted at the beginning of
    preprocessing.
DIGRAPHS_INIT
    Specify the initial state of digraph processing when OK_DIGRAPHS ==
    TRUE.  If this is set to TRUE, digraphs are recognized as the
    default, while it becomes not recognized when invoked by the -2
    option.  When this is set to FALSE, it is the other way around,
    digraphs are not recognized by the default while it becomes
    recognized by the -2 option.
OK_PRAGMA_OP
    If this is set to TRUE, _Pragma() operator becomes valid when
    __STDC__VERSION_ >=  199901L.  This operator is a C99 one.  (See
    manual.txt [3.7] for the specifications.)
    You can only set this to TRUE for MODE >= STANDARD.
OK_UCN
    Set this to TRUE for making UCN (universal character name) effective
    when invoked by -S1 -V199901L or -+ options in MODE == STANDARD.
    Default is set to TRUE.
OK_MBIDENT
    Set this to TRUE to be able to use multi-byte characters in
    identifiers when invoked by -S1 -V199901L in MODE == STANDARD.
    Default is set to FALSE.
CONCAT_STRINGS
    If this is set to TRUE, MCPP executes till the concatenation of
    string literals.  To do that, the necessary minimum conversion of
    the escape sequence is also done beforehand.  This is for compiler
    systems with CAN_CONCAT_STRINGS == FALSE.  However, if one sets both
    CONCAT_STRINGS and OK_UCN to TRUE, this does not convert UCN to
    multi byte characters.
    This is only set to TRUE when MODE <= STANDARD.
    Also, in this concatenation, if there is a risk of the beginning of
    a latter string becoming a part of the escape sequence, when the end
    of the previous string is escape sequence of hexadecimal or octal at
    the concatenation, this has been worked around by concatenating
    after the latter strings are converted to a three-digit octal escape
    sequence.  However, there are odd specifications or a slight bug for
    octal escape sequences in Borland C.  In MCPP, the concatenation of
    string literals is thought out to work around these bugs.

expr_t, uexpr_t
    Typedef to the maximum integer type.  If there are intmax_t,
    uintmax_t types, define to them.  If the compiler systems have long
    long, unsigned long long, define to them. If the compiler systems
    have __int64, unsigned __int64, define to these.  For the compiler
    systems even without unsigned long, define this to long, long.  Note
    long long and unsigned long long is required in C99.
EXPR_MAX
    Define the maximum value of uexpr_t.

  * UCN is a C++98, C99 specification, conversion of unicode character
    value to the hexadecimal escape sequence beginning with \u or \U.
    (See manual.txt [3.7], cpp-test.txt [1.8], [3.5]).


4.1.3.3         Special Configuration

DOLLAR_IN_NAME
    If this is set to TRUE, $ within identifiers becomes usable.

TOP_SPACE
    If this is set to TRUE, white spaces including the comment at the
    beginning of line are outputted. (after being compressed to a space,
    in principal.)  If set to FALSE, it deletes these.  As the white
    spaces at the beginning of lines in C source code do not mean
    anything, it should be fine to set this to FALSE.  However, when
    compiling illegal source code such as to locate the assembler
    program in C source code, it needs to be compiled by TRUE.  This may
    be needed also to use MCPP as preprocessors of other languages than
    C Standard.  The default is set to TRUE for MODE <= STANDARD.  Set
    to FALSE for POST_STANDARD.  When #asm is implemented in
    PRE_STANDARD, no matter what the TOP_SPACE is, the white spaces of
    the beginning of line will not be deleted nor do tokenization within
    #asm block.

SEARCH_INIT
    Specify the default rule when searching the include file.  When
    processing the directive such as #include "../dir/header.h", the
    rule of which directory should be searched first.  If this is
    specified to CURRENT, it starts to search the relative path from the
    current directory of MCPP invocation.  If specified as SOURCE, it
    starts searching from the directory with the source file (includer).
    If specified to (CURRENT & SOURCE), it starts searching the relative
    path from the current directory first, then the directory with the
    source file.
    If this cannot find the file, search the directory specified by -I
    option or environment variables.  Even if this does not find, search
    the system directory.  If these directories themselves are not
    specified by the absolute path, it is irrelevant to these rules and
    it interprets as the relative path from the current directory.
    This rule is also able to change by -I1, -I2, -I3 options at
    execution time.  The default is set to SOURCE for compiler systems
    on UNIX, GNU C or LSI C.  The default for other compiler systems is
    CURRENT generally.  For Turbo C or Borland C, it is CURRENT up until
    BC4.0 and it is (CURRENT & SOURCE) for 5.5.

OK_MAKE
    If this is set to TRUE, it implements -M* option to output the
    dependency relation line for makefile.

The next two are for debugging of MCPP itself, but it can also trace
tokenization, macro expansion mechanism or evaluation mechanism of #if
line.

DEBUG
    If this is set to TRUE, it implements the debug routine.
DEBUG_EVAL
    If this is set to TRUE, it implements the debug routine for #if, #
    elif lines.


4.1.3.4         Configuration of translation limits

RESCAN_LIMIT
    Defines the limitation of rescan time at macro expansion time.  In
    MODE == PRE_STANDARD, an infinite loop can occur by recursive macro
    expansion, but this limitation can stop that.  In MODE >= STANDARD,
    it does not need to be set to too big a value as the rescan time is
    small.
NBUFF
    Define the maximum length +1 of the logical line (the line spliced
    deleting \ at the end of physical line of source code).  The line
    after the comment converted to a space (it can spread out to
    multiple logical lines depending on comments) has to be within this
    length, too.
NMACWORK
    Define the internal buffer size of macro expansion.  Hence, the
    result of expanding macros within one logical line (when calling
    macro spreads out to multiple lines, the result of expansion), has
    to be within this size.  This is also used for the maximum length
    for memorizing the replacement list of one macro definition
    internally.
NWORK
    Defines the maximum length for output of MCPP.  This cannot be more
    than the maximum length +1 of what the compiler-proper can accept.
    Also, this cannot be more than the value of NBUFF and MNACWORK.
    When the line length after the macro expansion exceeds this, in the
    case of NWORK < NMACWORD, MCPP divides that to the line length less
    than this value, then outputs.  The length of string literal (in the
    case of CONCAT_STRINGS == TRUE, the string literal after the
    concatenation) has to be within the range of NWORK-2.  (The length
    of the string literal is not the number of elements of the char
    array, but the length of the string literal token in the source code.
    For example, \n is counted as 2 bytes including " on both sides.  L
    in the front is also included for wide string literals.)
IDMAX
    Defines the maximum length of an identifier.  A name longer than
    this value is not an error, but is cut down to this length.
NMACPARS
    Defines the maximum number of arguments of function like macros.
    This cannot be bigger than UCHARMAX.
NEXP
    Defines the limit of the nest level bound by parentheses in #if
    expression (in reality, the nest level is not directly decided by
    this.  Specifically, the number of constant tokens within an
    expression can be used up to two times of this, and the number of
    operator tokens that can be used is three times this value.  A pair
    of parentheses is counted as 2).
BLK_NEST
    Defines the limit of the nest level of #if (#ifdef, #ifndef)
    sections (how many levels #if and so on can be nested).
NINCLUDE
    Define the maximum number of the include directories to be searched.
    (Does not mean the level of #includes.  There is no limitation to
    the nest level).
SBSIZE
    Defines the number of elements for the hash table when macros are
    internally classified by a hash and are stored.  This has to be a
    power of 2.  It operates correctly when the number is smaller than
    the number of macros, but the process is slightly quicker when this
    is set to be bigger.

The specification becomes better with bigger sizes for each, but the
bigger the size of NWORK, NBUFF, NMACWORK or SBSIZE thus uses more
memory.  As a result, the number of macro definitions is limited for
systems with a small address space.  (Specifically, this is not the
actual number of macro definitions themselves, but the total of each
macro definition length, which is a problem.  The internal format of
macro definitions are written in struct defbuf of internal.H)  For
setting like NWORK, when MCPP is compiled by the small model of MS-DOS
with the default values in system.H, it may be ok with up to several
hundreds macro definitions, if these are simple ones.  However, if there
are lots of lengthy macro definitions, the limit will be much less.

NMACWORK, NEXP or RESCAN_LIMIT consumes stack.

Other settings do not need much memory, but it may be meaningless in
operation if setting the values to over the default ones within system.H.

The minimal limitations of translation limits required by C90 or C99 are
written towards to the end of system.H.  The translation limits of the C
++ Standard are also written, but this is not the required specification,
unlike the C Standard.


4.2     system.c

In the mainly, some configurations for the target compiler systems are
implemented here.

PATH_DELIM
    Defines path-delimiter of OS.  PATH_DELIM must not be \ (for the
    program's convenience).  This is set to / for MS-DOS systems.  Of
    course, you can use \ in user program, but this converts to /
    internally.
OBJEXT
    Defines the suffix of the object file, generated by the compiler
    system, in a string literal.  These are "o" for compilers on UNIXes
    or "obj" for compilers on DOS/Windows.  This is to be used for the
    output of the dependency lines for the makefile when -M* option is
    specified.

do_options()
    The options at MCPP invocation time are implemented.  When building
    this into compiler systems for which it hasn't been implemented yet,
    you may need to add a few lines to match the compiler driver of the
    system.  When you add to do_options(), you also need to add to
    set_opt_list() and usage() to match.
    do_options() calls getopt(), so you have to decide whether the
    option character is with or without arguments.  As a basic rule, the
    options like -P and -P- cannot be used.  (However, if this is
    necessary for compatibility with the compiler system's attached
    preprocessor, this can be done.  Refer to the implementation of -M
    option.)  Also, for the longer options such as -trigraphs, you have
    to implement by t as an option character and rigraphs as an argument.
set_opt_list()
    Sets option characters of MCPP.
usage()
    Usage sentences written.
set_sys_dirs()
    Sets the include directory.  Besides the directories of the system-
    specific ones specified by the noconfig.H (configed.H) macro,
    C_INCLUDE_DIR? or CPLUS_INCLUDE_DIR?,  /usr/include, /usr/local/
    include  on UNIX OS are also set in this.  (Specifying the include
    directory by environment variables of which names are defined by
    noconfig.H or by configed.H macro ENV_C_INCLUDE_DIR,
    ENV_CPLUS_INCLUDE_DIR, are setup in set_env_dirs()).
do_pragma()
    The processing of #pragma is implemented.  When the noconfig.H
    (configed.H) macro HAVE_PRAGMA is TRUE, #pragma sub-directive, which
    MCPP does not process, outputs as is and then is passed to the
    compiler-proper.  Otherwise, is discarded with a warning.  Those
    which MCPP processes by itself, such as #pragma __debug_cpp, are
    processed by the functions called from this function.  The handling
    after that is the same as #pragma which are not processed by MCPP,
    but only #pragma __put_defines and #pragma __once are not outputted.
    This is because of preprocessing the output again by using the
    "pre-preprocess" functionality of the header, (Refer [3.1] of manual.
    txt.)  In Standard C, the extension directive of individual compiler
    systems has to be implemented as #pragma sub-directive.  This
    function is enclosed by #if MODE >= STANDARD - #endif.
do_old()
    If you require the preprocessing directives which don't match to
    Standard C (The ones which are not #pragma sub-directives such as #
    assert, #asm, #endasm, #include_next, #warning, #put_defines, #debug),
    add the function which processes that and call from here.  This
    function should be enclosed by #if MODE == PRE_STANDARD - #endif.
    (However, for GNU C, #include_next, #warning can also be used in
    MODE >= STANDARD).

The rest of them are for the peculiar compiler systems and OSs.  Please
check the code itself.


4.3     mbchar.c

There is the processing routine of the multi-byte characters in here.
Usually, you should not need to change this file for the porting of MCPP.
However, when you implement non-implemented multi-byte character
encodings, the character type table and the processing routine need to
be added into this file.

The outline of the Multi-byte character processing is different between
16-bit and  32-bits or more systems.  There is a limitation to
implementing encoding with 16-bits system which do not have much memory.

char type[]
    The character type table for 16-bit systems.  Those with EUC-JP,
    shift-JIS, KS C 5601, GB 2321-80 or Big Five, with the prerequisite
    of ASCII being the basic character set, are implemented.  These
    include the area of the user-defined characters (external characters).
    For the other character sets, you need to add to this.  For 2 byte
    encoding without shift-states, you only need to define MBCHAR of
    system.H and add to this table.

short type_*[]
    The character type table for 32-bit or more systems.  There are the
    following four tables, and each of the character types of the
    following encoding are defined.  In any case, the basic character
    set is ASCII as a prerequisite.  At execution time, one of these
    will be assigned to a pointer named short * type.
        type_euc[]  :   EUC_JP, GB2312, KSC5601
        type_bsl[]  :   SJIS, BIGFIVE
        type_iso2022_jp[]   :   ISO2022_JP
        type_utf8[] :   UTF8

char * mb_name[][]
    A table of names of Multi-byte character encodings.

set_encoding()
    Routine to process #pragma __setlocale, -m option, and environment
    variables LC_ALL, LC_CTYPE, LANG.

mb_init()
    Routine to set the initial settings according to multi-byte
    character encodings.

mb_read_*()
    Routines to read multi-byte characters. There are the following
    three, and one of them will be assigned to the function pointer
    named mb_read.  With 16-bit systems, mb_read always becomes
    mb_read_2byte. These functions are each one of the following
    encodings for multi-byte characters.
        mb_read_2byte() :   EUC_JP, GB2312, KSC5601, SJIS, BIGFIVE
        mb_read_iso2022_jp()    :   ISO2022_JP
        mb_read_utf8()  :   UTF8

mb_eval()
    Routine to evaluate the value of character constants for multi-byte
    characters and wide characters.  Used at the evaluation of #if
    expression.

With regards the encoding of Multi-byte characters, I used some
materials which include the following references.

    Ken Lunde "Understanding Japanese Information Processing",
        1993/09, first edition, O'Reilly & Associates, Inc.

    Kouichi Yasuoka/Motoko Yasuoka "The world of character code",
        1999/09, Tokyo Electric University Publications


4.4     lib.c

Source code for some library functions, which some compiler systems may
not have or have a problem using, are written here.  As each of them are
enclosed with #if ! HOST_HAVE_XYZ - #endif, this XYZ function is used at
HOST_HAVE_XYZ == FALSE.


4.extra malloc()

"kmmalloc -- malloc() with debugging functions" is a portable source of
malloc(), free(), realloc() and calloc() which I wrote.  I wrote this to
improve the memory efficiency, and thought about the debugging.  I also
attach the debug routine.  Unexpected bugs can be caught if this is
linked.  *1, *2

The reason why I provide -DKMMALLOC -D_MEM_DEBUG -DXMALLOC options in
noconfig/*.mak, is to link my ones such as malloc() which has debug
routines.  If the MCPP, linked with this, exits with error number EFREEP,
EFREEBLK, EALLOCBLK, EFREEWRT or ETRAILWRT, it indicates a MCPP bug.

Significantly, when compiling by the large data memory model of MS-DOS/
Borland C, please avoid using the compiler's attached malloc() as it
(especially free()) is extremely slow and MCPP uses these functions
heavily.  Since the malloc() of WIN32 / Visual C++ is fairly slow, you
may want to avoid using them.

If you define any of BSD_MALLOC, DB_MALLOC or MALLOC_DBG to 1 and
compile the MCPP, with each debugging malloc() will be used, not my
malloc().  In any case, to use the malloc() not attached to the
compilers, you have to make the library before you compile.  About this,
please see the document of kmmalloc.  (This document is only written in
Japanese, sorry.)

  *1 kmmalloc is in the following location.
        http://download.vector.co.jp/pack/dos/prog/c/kmmalloc-2.5.lzh
  *2 In CygWIN, my malloc() is not used as other malloc() are not
    allowed to be used.


                  5   Bug reporting and porting report

5.1     Is this a bug?

The Validation Suite for the Standard C conformance of preprocessing is
also made public with MCPP.  I tried to make this be able to verify all
the specifications of Standard C preprocessing.  Of course, MCPP is
checked by this suite.  They were also compiled by the above mentioned
compiler systems and verified.  Therefore, I don't think there are much
bugs or wrong specifications, but there may have been some left.
Notably when porting to new compiler systems never ported before, it may
be that there are some bugs of the compiler systems.

If you find unusual behavior, please contact me.  Please check the
following points.

  1. For Standard mode, use the Validation Suite first to make sure your
    understanding of the Standard is correct.  For the system with which
    GNU C / testsuite can be used, automatic testing can be done by
    executing the configure script first then using 'make check'.
  2. Check the document to make sure there are no mistakes in porting
    your MCPP.
  3. Extract the sample source to reproduce the bug.
  4. Trace the behavior of MCPP by enclosing the place where you get the
    bug with  #pragma __debug_cpp <args> and #pragma __end_debug_cpp.
    Increase these <args> and trace in detail.

If the diagnostic message of "Bug: ..." is displayed, that is definitely
a bug of the MCPP or compiler systems (more like MCPP).  Even if the
MCPP goes out of control by processing jumbled "source", that is also a
bug.

Of course, MCPP of modes other than Standard C behave "incorrectly" in
the Validation Suite, as that is the specification.  (Even that should
not run uncontrollably).  Please see the [4.1.3] for details of the
specifications.


5.2     Check for malloc() related bugs

There is a library which I wrote, with functions such as malloc() called
kmmalloc.  (Refer 4.extra)

If MCPP is linked to my malloc() exits out with the error number 120-124
(or 2120-2124 for some compilers), that is definitely the MCPP or
compiler bug.  (Possibly the library function's.)  Also, if you write,

    #pragma __debug_cpp __memory

somewhere in the sample source used in the test, the information for the
heap memory will be output at that location and at the end.  However, if
the error message "Heap error: ..."  is shown there, then that is also
the MCPP or compiler system's bug.

If any bugs are found, please repeat the test by enclosing each part of
the sample source by #if 0 and #endif, and mark out where the bug is.
Please link my version of functions like malloc() for testing.


5.3     Bug report

Please attach the following data for the bug report.

    1. The compiler system to which MCPP has been ported.
    2. Porting method (e.g. the configuration of system.H, etc.)
    3. Sample source to reproduce the problem.
    4. The results.


5.4     Porting report

I tried to write MCPP to be able to be ported relatively easily to any
compiler system.  However, I only have a small number of the compiler
systems.  I am interested how porting to other compiler systems, that I
don't have, goes.  I am looking forward to hearing about the porting
reports to those compiler systems.  I would like to make MCPP perfect by
improving the source using this feedback.

Please make the porting report in the following format.

  1. Compiler system.
  2. The configuration of noconfig.H (configed.H), system.H and system.c.
    Possibly the difference file with the original is best, but just a
    note is fine for the simple one.  If the size is big, the file
    itself is fine.

To check if it has been ported correctly, it may be easiest changing the
preprocessor first and then re-compiling itself by using the pre-
preprocess functionality of the header files.

Furthermore, use the Validation Suite for the Standard C mode.  However,
this requires lots of effort when repeating the debug since there are so
many files.  During the debug, at first, compile n_std.c and i_std.c to
see if this compiles and executes correctly.  Some compiler drivers
attached to the system may not have the option to pass to the MCPP, but
please refer to manual.txt [2.1] for that.  There is a method of passing
through to MCPP before compiling.  If these two sources can be processed
correctly, then the porting was a success.

If these failed, check manually where the problem is by using the sample
n_std.t and i_std.t.  If these are a success, check e_std.t, unspcs.t,
warns.t and misc.t.  In "POST_STANDARD" mode, n_post.t, i_post.t and
e_post.t should be used.

Process these with cpp -Qci23 option.  If this is compiled with STDC==0,
add -S1 -V199409L option as well.  As the comments will also be
outputted by the -C option, you should be able to see that the process
result is the expected one or not.

As the diagnostic messages are output to the file called cpp.err by the
-Q option, read it using a pager or similar.  -i option omits the output
to the header file.

Digraph and trigraph becomes valid by -2 or -3.  -S1 and -V199409L sets
__STDC__ to 1 and __STDC_VERSION__ to 199409L.

To test C99 compatibility, check n_std99.t, e_std99.t with -S1   -
V199901L option.

If you use the program called cpp_test.c in the Validation Suite, you
can run the sample test of n_*.c, i_*.c automatically.  (However, this
is just to check yes and no, and this doesn't tell the details.  Also,
other tests such as e_*.?, u_*.?, unspcs.?, warns.? are not included.
To test MCPP itself, it is quicker to compile n_std.c, i_std.c.)

Validation Suite V.1.3 or later has testcases for GNU C / testsuite.
Therefore, when MCPP is ported to one of the versions of GNU C, MCPP's
automatic test can be done by replacing the preprocessor of GNU C to
MCPP if GNU C / testsuite is installed.  About this, please see cpp-test.
txt [2.2.3], manual.txt [3.9.5], [3.9.7].  Also, automatic testing can
be done using the Validation Suite to configure MCPP and 'make check'.


5.5 Information for the Configure Script of other compiler systems
            than GNU C

MCPP V.2.4 makes the configure script available in UNIX systems.
However, I do not have any idea for other compiler systems besides GNU C
in Unix systems, so some options need to be specified in the configure
script.

Someone who is using these compiler systems should know or be able to
check about details of specifying these options.  If you know, please
let me know.  I would like to do further work with this configure script.

Please refer README for the configure script.


5.6     I will try to port if you send me the data.

When you can't port successfully, please let me know what is happening.
If you attach the following data, I may be able to return it to you with
the source ported.

In environments where the configure can be used, you can find out lots
of data through its use.

  1. OS and the format of the path list.  (I only know UNIX, DOS/Windows
    and OS-9.)
  2. The compiler system name and the version.
  3. The basic character set is ASCII or not.  If not, what kind of
    character set.  The encoding of multi-byte characters (Kanji
    characters) is shift JIS or EUC-JP or something else.  If the
    encoding includes the codes such as <backslash> in the multi-byte
    character like Shift-JIS, whether the compiler-proper recognizes
    this or not.
  4. The shell (command processor) is case sensitive or not.
  5. The file name is case sensitive or not.
  6. The execution options which one wants to implement.  The option
    passed from the compiler driver.  The options when running by
    preprocessor alone.  (one that cannot be implemented by getopt() is
    impossible.)
  7. The compiler system where preprocessor is separated or so-called
    one path compiler.
  8. The predefined macros of the compiler systems and the values.  How
    it goes for C++.  (Distinguish between the macro passed from
    compiler driver by some options such as -D  or the predefined macros
    of the preprocessor itself.)
  9. The prototype declarations of the functions are possible?
  10. If there is unsigned long type?
  11. How about long long and long double?  If there is long long, what
    is length modifier of long long in printf().  If there isn't long
    long, is there the same size type?
  12. If there is <limits.h>, the value of CHAR_BIT, UCHAR_MAX, LONG_MAX,
    ULONG_MAX.  If there isn't a <limits.h>, the value equivalents to
    these four.  (If 1 byte is 8 bits, it should be the same as the
    default values of system.H.)
  13. If there is FILENAME_MAX in <stdio.h>, its value.
  14. If the compiler systems accept #pragma.  The argument of #pragma
    line is subject to the macro expansion?
  15. If there are getenv() functions.  What kind of name should be used
    for the environment variables to specify the include directory.
    What separator should be used for writing the multiple path in the
    environment variables.
  16. Include directory for standard use.  The rule when searching the
    header file by #include.
  17. \a or \v can be used?
  18. const modifier can be used?
  19. Are there the header files <string.h>, <stdlib.h>, <stddef.h>,
    <time.h>?  How about the definition of size_t, time_t types.
  20. Is any necessary function missing from libraries?
  21. Can concatenation of string literals be done by the compiler
    proper?
  22. Does the compiler proper recognize digraphs?
  23. $ needs to be used in an identifier?
  24. Are there #asm and #endasm?  How about the passing format of the
    block enclosed by these directives to the compiler proper?  What are
    the other non-standard directives?
  25. Which #pragma sub-directive should be processed by preprocessor?
  26. How long is maximum length to be received by the compiler proper?
    (You can find out by compiling test_l/l_37_8.c in the Validation
    Suite.)
  27. How many bytes can be identified for the identifier in the
    compiler proper?
  28. What size is the memory space?
  29. After compiling, what is the suffix of "object file" before the
    link? (equivalent to .o of the compiler systems on UNIX or .obj of
    the system on MS-DOS.)
  30. The result of the following sample t_line.c processed only by the
    preprocessor.  (Use separated preprocessor or specify the output
    after preprocessing by option.)  This is to see the method of
    passing the line number and file name information to the compiler
    proper.  As the contents of <stdio.h> are too long, it is enough
    with the first 10-20 lines and the last 10-20 lines.
    Also, for the compiler systems, which the processed results of #line
    1000 does not become #line 1000 "t_line.c", but other formats such
    as #1000 "t_line.c", modify this to #line 1000  "t_line.c" and pass
    through to the compiler proper.  Once it has been passed, check to
    see if this can be recognized or not.  (If it does not error out by
    #line 1000  "t_line.c", it should have an error message in the line
    of the "error line;".  Check to see how the line number displays in
    the error message.)

/* t_line.c */
#include    <stdio.h>

#line 1000

    error line;

main(void)
{
    return  0;
}

If the host compiler and the target compiler are different, I need all
the above data for both systems.

To look at it like this, there are so many things to check.  However,
most of the compiler systems should have common characteristics with the
ones already successfully ported, so it should not have too many
problems to port for just running.  The implementations of the execution
options, #pragma and the non-standard specification will be the
relatively time consuming ones.  These can be done gradually after
porting just to be able to run.  The only annoying aspects are when one
gets caught by compiler bugs.


5.7     Please report the test of other compiler systems by the
                Validation Suite.

The Validation Suite results of preprocessors for the compiler systems I
have are summarized in cpp-test.txt Sec 5.

Please let me know the result of testing with other compiler systems.
It may be a bit of effort, as there are so many items.

The test by cpp_test.c does not take long, please send me at least this.
In case of GNU C, the automatic test can be done by the Validation Suite.


5.8     The feedback for improvement

Besides reporting bugs, please send me feedback for anything, such as
the handiness of MCPP, diagnostic messages, MCPP source code, Validation
Suite, my interpretation of Standard C or the document writing method.

This preprocessor is created as a hobby, but it is the result of having
devoted six and a half years, with lots of ideas even up until V.2.0.  I
want to make this the best, as much as I can, after such a work.  About
the C preprocessor, I think I have done everything I should and the
meaningful things, except testing and porting to the compiler systems I
don't have.  I would like to improve it if there are any problems that
exist.

The code of Martin Minow was very clear, viceless and easy to understand,
and I learned a lot by just reading this source code.  I am a bit
worried that there are lots of parts which lost the unity of style and
became a bit messy with this revised version.

The people who are interested in this may be very limited, but I am
looking forward the feedback and the information.

Please send the information and the feedback to the newsgroups comp.std.
c, fj.comp.lang.c or by e-mail.


                          6   Long way to MCPP

6.1     Three days to plan and six years to develop

When I started messing about with DECUS cpp in Jan 1992, I had never
even dreamed to take this long a stretch.  I just thought I would change
it a bit in the new years break.

Once I started, I realized I had to read the source properly and it took
me about two months to read through.  I did it because the source was
worth reading as well.  Then I versioned up some of the specification to
adapt to C90.  It was as planned till this point.

However, I realized I did not really know the preprocessor specification
of C90 precisely.  When I read P. J. Plauger & Jim Brodie "Standard C"
(1989), the function-like macro expansion methods turned my prejudice
around completely.  (A Japanese translation version of this part was
miss-translated.)  So I bought a copy of Standard C and I repeatedly
read those difficult parts related to preprocessing.  As a result, I
found the preprocessing of C90 is different in many points from the
traditional one.  The addition of #, ## operators are only a small part
of them.

Significantly, I had beaten my brain for a lot from the function-like
macro expansion routine.  I thought it over for 2-3 weeks consulting the
cpp source of E. Ream, and then I wrote the new macro expansion routine
for C90.  I have never used my brain so hard as for thinking the
algorithm of the program.  That was April, 1992.
     Well, I thought I was over the hump and that the cpp playing was
finished, but it took almost a further six years since then.  However,
there were not many problems that made me suffer during the rest.
Nevertheless, it took so long.  That was partly because I got bored
thinking and couldn
't concentrate on messing around with cpp.  But that wasn't all.  I did
the following things.

  1. Made the specification clearer.  In Standard mode, completely
    adapted to the standard.
  2. Re-structured the program/data structures to focus on the Standard
    C mode.
  3. Changed the style of the source to improve the portability.
  4. To do debugging.  Prepared for bugs or imperfections of the
    compiler systems.
  5. Created the test programs which is the Validation Suite.
  6. Tested other compiler systems.
  7. Wrote documentation.
  8. As I bought a new PC in July 1997, I spent the time for the
    installation and learning of WindowsNT/95, X Window System and their
    software.  While doing that, C99-1997/11 draft had been released and
    it required adaptation to this.

In this list, the documentation took a long time.  Especially the last
four years, the time changing the source was only a little bit while
most of the time was dominated by writing the documentation.  Due to
that, the documentation became such a volume, but the time taken was not
only because of the volume.  When I was writing the documents, the
uncertain parts of the specification kept coming up.  Each time I re-
read the Standards, I changed the source code.  The length of time
changing the source was not a lot, but the number of times changing the
source was a lot.  The Standards are not only the Preprocess standard,
but I also read well including the Rationale of ANSI C.  It's like I
learned C90 by creating the preprocessor.  Also, I could understand the
problems of the C90 standard through this.

At first, I wrote a few simple test programs as samples.  However, I
found unexpected bugs each time I wrote and tested on MCPP.  Then I
decided to write the Validation Suite which would test every
specification of the C90 preprocess.  The problems of C90 became obvious
by writing this Validation Suite.  To comply to the irregular parts of
C90 was such trouble and a bit meaningless for myself, but I am sure
there were more meaningful things.

What I learned through this work are the following things.

  1. The program specification cannot be definite until finishing off
    the detailed document.
  2. The debugging of the program cannot be completed until completion
    of the samples which test every specification.

This thinking is a sort of perfectionism.  Things in the world mostly
cannot be achieved by perfectionism, and software is not an exception.
However, there are some areas for which perfectionism has a very
important role.  The language processing systems may be one of them.

I can say that I could spend so many years, through and through, because
this is my hobby.  But six and half years is too long.  I kept thinking
about who would be going to use this after I spent so many years to
create a perfect program.  I think this must be the limit of the size
for making a program as a hobby.  I will try not to do a large-scale
project as a hobby any longer.

However, as I have already done MCPP, I will keep maintaining it.
Therefore, could everyone please send me feedback, bug reports or
porting reports.


6.2     V.2.3

After releasing V.2.0, I have been updating to V.2.1, V.2.2 and then V.2.
3.  These updates were adapting to C99 or officially approved ISO /C++,
increasing the supported systems or fixing bugs.

I could update quite easily until V.2.2.  It only took three months from
V.2.0 to V.2.2.  However, it took nearly four years from V.2.2 to V.2.3.
The main reason was that I became busy and didn't have enough time to
spend.  I cut down my working days to 4 days a week after turning to 60
years of age in July 2000, then I went back to playing with cpp again.

V.2.3 not only took time but took quite a lot of work as well.  When I
implement to GNU C V.2.9x, I found out that I had to modify a lot to
keep the compatibility with GNU C/cpp.  I added some options and
implemented the expanded specification.  Also I eased restrictions of
the Standard by downgrading some errors to warnings or removing the
highly frequent warnings from the default warning class.

Lots of those modifications are backward ones and were not enjoyable.
Especially, maintaining both the C99 specification and the part of the
"traditional" specification earlier than C90 was very much against my
will.  Unfortunately, this is a reality of the "open source" world, I
had to meet to certain expectations.

By relaxing the restrictions of the standard, I think MCPP became easier
to use also for the other compiler systems, in replacing the system
attached preprocessor.


6.3     Selected to "Exploratory Software Project"

During the update of V.2.3, MCPP and Validation Suite are selected to
2002 "Exploratory Software Project" of Information-Technology Promotion
Agency, Japan (IPA).  I found out about this project by chance and I
entered.  Then, the project manager Yutaka Niibe selected me.  That is
how the development went from July 2002 to Feb 2003 by IPA
's funding and based on Niibe's advice.  The translation of the
documents is also taken by HighWell.

Though this was relatively small software, it became my life work after
spending so much time.  I had confidence with the quality, but I was
disappointed without having an opportunity to publish.  Finally, the
opportunity was given.  To accomplish this project, I cut down my job to
three days a week.

These are the things that I had intended to do in this project at the
beginning.

  1. Create English version of the documents.  By using these, release
    MCPP and Validation Suite to the international sites.  In the
    current situation of most C compilers being made in the US, it is
    vital to have English versions of the documents to spread and get
    evaluations for this.
  2. Until now, the evaluation and porting of free compiler systems are
    the main objective, but the test by Validation Suite of the main
    compiler systems on the market and the porting of MCPP to those
    compiler systems will be processed.
  3. Support further later versions of the compiler systems which have
    already been supported.

However, MCPP is just a C preprocessor and only a part of C compiler
system, and it was missing a sales point as the "Exploratory Software".
To overcome these, Niibe, Project Manager, suggested the following
points:

  1. Support GNU C 3.x
  2. Make the Validation Suite to be able to be used in testsuite of GNU
    C 3.x.
  3. Make everything public during the development.

As I wanted to do these things too, I gratefully added these points to
the project.

However, my project had delay after delay for various reasons.  First, I
was hit by a crashed disc.  Whenever I did new things, it took a long
time as I had to use new software never used before.  It was also the
first time to compile GNU C from the source, but also I had got a few
problems.  The updating of massive volumes of documents and the review
and the correction of English version also took a considerable time.
Furthermore, my mother was admitted to hospital.  As a result, a part of
the project, such as the support of the commercial compiler systems, had
to be given up at the end.

As I had always done the way which is like digging a hole deeper and
deeper, it took a long time when I had to try to widen the hole.  When
an amateur-programmer digs deeper into the matter, this is the only way
to do it.  However, to make the result to go out into the world, the
hole had to be widened to some extent.

During the process of widening up the hole, I managed to learn some new
software and to be in the frontline of development while receiving the
advice and the encouragements from Project Manager Niibe.  Also, I was
delighted to see my documents coming back in a flowing English.  Though
being pressed for time was a painful thing, each experience was fresh
and fun.

This "Exploratory Software Project" did not finish there.  Project
Manager Ichiji has also selected MCPP as a continual project for year
2003. This is how I started to do some unfinished tasks from the
previous year, and also some areas which I did not have experience of
before.

This time, my six year old pc experienced some problems, and there were
also further problems during the upgrade of the hardware and OS.  It
also took time to learn the new software, and of course, the development
was getting behind schedule.  The condition of my mother, who had been
out of hospital and in relatively good condition, became worse along
with getting closer to the end of the project.  This was also a source
of my anxiety.(*)  However, thanks to Project Manager Ichiji setting the
due date to a reasonable timeframe, I could work the tasks through
thoroughly without rushing.

I accomplished tasks such as the porting to Visual C++ and Plan9, the
creation of the configure script and supporting the various multi-byte
character encodings including ISO-2022-JP and UTF-8.  I also managed to
do the clean-up of the source code which, though inconspicuous, can not
be ignored by myself as the author.  The time consuming work of updating
the Japanese and English documents was accomplished with the co-
operation of HighWell.

I think MCPP is the world's best quality C/C++ preprocessor which
supports multiple compiler systems, thanks to the "Exploratory Software
Project" which took nearly two years.  As the middle-aged amateur-
programmer, I am satisfied with myself having done my best.

  * My mother has died at February 17, 2004.

                                                                   [eof]
