GnuWin

Notes on compilation

Prerequisites

If you wish to compile yourself: get the source package <package>-<version>-src.zip. You need GNU Bash, GNU Make and Mingw32 GCC and BinUtils. In these notes it is assumed that you are familiar with Bash, Make and GCC. Win32 implementations of Bash can be found in the CygWin tools, in the Msys tools, and in the DJGPP tools. Win32 implementations of Make can be found in CygWin, Msys, DJGPP, and Mingw. CygWin Bash and Make work quite well. Note that, when you mix CygWin or Msys Bash and a native Make, problems may occur because CygWin and Msys have their own way of absolute filenames (for example c:/tools becomes /cygdrive/c/tools in CygWin and /c/tools in Msys).

You can install Cygwin and its basic utilities (Autoconf, Automake, Bash, Bison, Coreutils, Diffutils, Bash, Findutils, Flex, Gawk, Grep, Libtool, M4, Make, Patch, Sed, Which) from any Cygwin mirror by using the setup program.

Then install Mingw; you'd best use the latest regular release ("Current"). Mingw can be downloaded from its Sourceforge site. You'll need GCC, Binutils and Windows API. Do not install these into the Cygwin directory. Make sure the directory with the GCC and Binutils executables comes before the Cygwin ones in your Path. You cannot use the Cygwin GCC and Binutils, because the executables they create are not native Windows ones, but depend on the Cygwin emulation layer (cygwin1.dll).

Configure and Make

If you use the sources from GnuWin, then these have already been patched and configured and there is no need to execute configure. Remove any .deps directories, because they contain the dependencies, mostly header files, for the sources and these may be different for your machine; then execute ./config.status to recreate the default .deps directories.

If you use the original sources, the configuration and ad hoc changes needed to compile are done in Makefile.mingw; type make -f makefile.mingw at the Bash prompt. General configure options have been set in a config.site; make sure that the environment variable CONFIG_SITE points to this file. If there is no Makefile.mingw, then type ./configure.

When configure has finished, type make. Sometimes you need additional libraries and include files. Usually the line export LIBS = ... and other lines with -l... in Makefile.mingw show which additional libraries are needed. If you have these libraries, then you will also have the include files needed by these libraries. Rarely you need more include files; if on compiling you get an error message about a missing include file, then these might be found somewhere in the CygWin, Msys, or DJGPP distributions, but be careful not to replace any native declarations. If you make from the original sources, then you may need to apply patches from the patches directory in the GnuWin sources, in particular when make exits prematurely with an error message.

In Makefiles, you may have to change ln -s to cp, or use a version of ln that actually copies instead of making soft links.

More and more packages use LibTool for compiling, linking and installing. When installing into a directory with ~ in its name, such as c:/progra~1, it gets confused; so change all occurrences of ~ in libtool into some other character, e.g. !

Compiler options

There have been reports that GnuWin executables have crashed on systems with processors other than Intel, e.g. on systems with an AMD processor. These crashes can be avoided by compiling with options specific to Win32 systems, e.g. by using -mms-bitfields -march=i386 as options to GCC.

Unix functions

Several packages use functions that are standard on Unix, for example for obtaining the user name. Some have MS-Windows equivalents, others don't. You will have to provide a MS-Windows equivalent that does something sensible; usually a dummy that does nothing, also works. Equivalents for several functions are in the LibGw32C library, which is an extension of the Msup and Mstubs libraries. Source code, e.g. from LibGw32C, for the needed functions can be copied to the package sources; you'll also have to adapt your own Makefiles and include files. Examples of code conversion between Unix and MS-Windows can also be found in Chapter 9 of the Unix Application Migration Guide on MSDN.

Dynamic libraries

Packages that contain a library, usually build only a static library (with extension .a). A dynamic link library (DLL) with corresponding import library can be built from this static library with the linker ld, by dlltool or by dllwrap (provided in the Mingw BinUtils collection). The shell scripts a2dll and o2dll show more details.

If a package has originally been configured by means of autoconf (shown by the existence of the file configure.in or configure.ac), then it might be reconfigured to make dynamic libraries, but very often this does not seem to be worth the trouble.

When you have built the DLL, you can rebuild the executables such that they use the DLL. Delete or rename the executable first. Since often the Makefile calls the library explicitly (for example ../.libs/foo.a) rather than with the -L/-l-options (in the example: -L ../.libs -lfoo), either change the Makefile or temporarily rename the import library to the name of the static library. Then run make again.
For libraries that are called in the standard way with -L/-l, Mingw automatically chooses the import library for the DLL rather than the static library if the import library has extension .dll.a.

For packages that use LibTool, this will not work, since LibTool then remakes the static library. Instead change in libfoo.la (in the directory just above the .libs directory that contains libfoo.a), the term libfoo.a to libfoo.dll.a, and run make again. In principle, LibTool will build dynamic libraries if the option --enable-shared to configure has been set, but in practice only the latest versions of LibTool can handle this and even then you may still end up with a static library. Some helper scripts, latool and rctool may be used instead of LibTool when dynamic libraries are to be created.

It is possible to create import libraries for use with MSVC and BCC.

On Unix it is practice to add a version number to the names of shared libraries; releases of a shared library that have the same version number have compatible interfaces, i.e. functions are called in the same way. On MS-Windows this seems also useful, so GnuWin dynamic libraries have a version number attached, usually computed from the LibTool interface version number.

Be careful not to mix different versions of the same library, since this may lead to crashes. In particular do not mix different run-time libraries, such as crtdll.dll, msvcrt.dll, msvcrtnn.dll, where nn denotes the version number (20, 30, ...); see the MS Knowledge Base and MSDN. Nor should you mix CygWin dlls and native dlls.

Auto-import

Mingw versions 2.95.3-3 and earlier cannot import static data from a DLL in the standard way, i.e. by using the extern declaration. This shows as an auto import warning when linking an executable that uses the DLL: Warning: resolving vvvvv by linking to __imp__vvv (auto-import) where vvv is the name of the static variable. It may also show as an Undefined reference to _nm__vvv or Undefined reference to dllname_dll_a_iname where dllname is the name of the dll to be created. If this occurs you may have to change some include files that declare these static data. Include at the start of the source:

#ifndef __GNUC__
# define __DLL_IMPORT __declspec(dllimport)
#else
# define __DLL_IMPORT __attribute__((dllimport)) extern
#endif

#if defined (BUILD_ddd_DLL) || !defined (__WIN32__)
# define DLL_IMPORT extern
#else
# define DLL_IMPORT __DLL_IMPORT
#endif

Replace extern by DLL_IMPORT in all relevant places in the include files, and add -DBUILD_ddd_DLL=1, where ddd indicates the DLL, as flag to the compiler when compiling code that imports from the DLL.
Versions 2.95.3-4 and higher circumvent this auto-import problem when the option --enable-auto-import is given to the linker; for versions 2.95.3-6 and higher this is the default behaviour, so you need not set the option. Very occasionally you still get an error message; solve this in the above manner (see also the documentation of the GNU linker ld, section 2.1.1).

Text files and binary files

On MS-Windows there is a difference between text filemode and binary filemode. Normal text files are files where CR-LF signifies a line ending. Text file with LF as line endings can be correctly read by the input functions of the runtime library; the only error occurs with ftell in the last part of the file (see below).

Unless you are sure that a file is always a text file, it is best to open it in binary mode; so add "b" to the mode when using fopen and O_BINARY when using open. For O_BINARY to be defined, you may have to include fcntl.h. After a file has been opened, its mode may be changed by calling setmode before any output or input has occurred.

Standard input, output and error can be opened in binary mode by adding

#include <fcntl.h>
int _CRT_fmode = _O_BINARY;

to the beginning of the main program file, or by including stdbin.h. Alternatively, you can compile stdbin.h into a small library and link it to the executable.

Similarly, all other files will be opened in binary mode, even when "b" has not been specified in the mode parameter of fopen, when

#include <fcntl.h>
int _fmode = _O_BINARY;

is added to the beginning of the main program file, when binmode.h is included, or when binmode.h has been compiled into a library and linked to the executable.

The result of ftell when a file with LF characters as line endings is opened as a text file may differ from the result when the same file is opened as binary file. When a file containing CR-LF characters is opened as text file, the CR's are deleted while reading; this is done when characters from the file are transferred to the read buffer. Ftell correctly computes the number of bytes for a position in this file by doubling the number of LF's that are still in the read buffer. When a file with LF's as line endings is opened as a text file, then ftell again doubles the number of LF's still in the read buffer when computing the number of bytes, but now this is of course incorrect. Because of the particular way the CR's in a CR-LF text file are deleted, this error only matters when the last part of the file is in the read buffer, so that normally positions in the last 512 bytes of the file are incorrectly determined by ftell. This does not matter when the result of ftell is only used as input for fseek to return to a previous position, but it does matter when ftell is used to determine the absolute position in a file.

Filenames

The path separator on Unix is the colon (:) and the directory separator is the forwardslash (/); on MS-Windows these are the semicolon (;) and the backslash (\). Filenames with forwardslashes are understood by MS-Windows, but you will have to change colons to semicolons when used as path separator. Tests for absolute filenames (on Unix filenames starting with /, on MS-Windows filenames starting with x:/ or \\) must also be changed, as well as absolute filenames such as /tmp/..., /usr/..., /dev/..., /etc/.... These filename issues may also occur in shell scripts provided with the package. Often they are also the cause of failure in tests or checks with make test or make check.

Temporary file names may either be hardcoded (/tmp/...) or created with the help of an environment variable, usually TMP or TMPDIR. On Windows, the temporary file directory is Temp or Windows/Temp; and on Win9x the corresponding environment variable is TEMP. You will have to change the Unix names, set the Unix environment variables, or adapt the source to look also for the Windows environment variable TEMP.

Filename globbing

Wildcards on the command-line are expanded by the command-line interpreter. If you wish to disable this filename globbing, then add

int _CRT_glob = 0;

to the beginning of the main program file.

Default locations

On Unix, executables usually are installed into /usr/bin and implementation-independent files, such as configuration and language files, in /usr/share or in /usr/etc, whose names are often hard coded in the executable; see the File System Hierarchy Standard. On MS-Windows there is no default location, and instead most packages go into a directory of their own, e.g. E:/Program Files/<package>. When the name of the implementation-independent directory is hard-coded in the program, packages with implementation-independent files must be installed in their default installation directory, which for GnuWin is always C:/Progra~1/<package>.

It is not very difficult to change a program such that it also looks into the implementation-independent directory relative to the directory where the executable is installed; for example, when the program has been installed into D:/Applic/<package>, it looks for its configurations in say C:/Progra~1/<package>/share and when nothing has been found there, it looks in D:/Applic/<package>/share. This solution has been followed in the later ports on GnuWin, which thus may be installed in any directory provided the subdirectory structure is maintained. Native language support (NLS) in LibIntl has also been adapted in this way. An alternative solution would have been to let the program read an initialization file in its program directory or let it read the registry.

For this so-called run-time relocation it is best to use Gnulib. You'll need the source files error.c, progname.c, progreloc.c, relocatable.c, and the header files areadlink.h, error.h, progname.h, relocatable.h. Add the additional source files to the files to be compiled either in the package library, usually in the directory gl or lib, or to the sources, usually in the directory src. You must also define the macros INSTALLPREFIX equal to the original installation directory, INSTALLDIR equal to the original installation directory of the executables, EXEEXT equal to the extension of the executable, as well as NO_XMALLOC (unless you have a function xmalloc, in which case you must use xreadlink.h instead of areadlink.h). In the language of Autoconf, this usually amounts to

-DINSTALLPREFIX="$(prefix)" -DINSTALLDIR="$(bindir)" -DEXEEXT="$(EXEEXT)" -DNO_XMALLOC

In the source files you must replace each occurrence of filenames to be relocated by relocate(<filename>); in each source file where you do this, you must include the header file relocatable.h, preferably in the form

#ifdef ENABLE_RELOCATABLE
# include <relocatable.h>
#else
# define relocate(path) (path)
#endif

In the main source file, usually main.c, you must add the statement set_program_name(argv[0]); and include the header file progname.h. If in main.c, a variable program_name has already been declared, you must remove this declaration as well as the statement that assigns a value to program_name, usually argv[0].

Large-file support

Normally the functions of the MS-Windows C-runtime library (msvcrt.dll) can access files up to 231-1 bytes, i.e. 2 GB. In particular this holds for the group of stat and seek functions: stat, fstat, seek, fseek, lseek, tell, and ftell as well as the related types ino_t and off_t. Special msvcrt-functions and types, indicated by the addition of i64 to their name, can access files up to 263-1 bytes, i.e. 9 EB (exabyte) = 9,000,000 TB (terabyte) = 9,000,000,000 GB. Large-file support (LFS) has been implemented by redefining the stat and seek functions and types to their 64-bits equivalents. For fseek and ftell, separate LFS versions, fseeko and ftello, based on fsetpos and fgetpos, are provided in LibGw32C.
More information about LFS on Unix can be found at Freshmeat, in the Single Unix Specification, and the documents of the Large File Summit.

Subprocesses

fork is the function that implements subprocesses on Unix. It does not exist on MS-Windows, and has to be replaced by a series of different API calls, such as spawn or CreateProcess. Chapter 9 of the Unix Application Migration Guide, topics Interprocess Communication and Appendixes E and F, gives some examples.

Inode numbers

The MS-Windows equivalent of the Unix inode number is the FileIndex from the BY_HANDLE_FILE_INFORMATION structure, returned by the Win32 API function GetFileInformationByHandle. The FileIndex is a 64-bit number that on WnNT systems (NT, 2000, XP, 2003, Vista, 2008) indicates the position of the file in the Master File Table (MFT). On Windows XP and higher, one can also obtain this number by using the command fsutil usn readdata <path>. It is stable between successive starts of the system, provided the MFT does not overflow and therefore has to be rebuilt. It is not stable for files on network drives; successive calls to GetFileInformationByHandle return different values. For FAT file systems, the MSDN documentation for BY_HANDLE_FILE_INFORMATION says: "In the FAT file system, the file ID is generated from the first cluster of the containing directory and the byte offset within the directory of the entry for the file. Some defragmentation products change this byte offset. (Windows in-box defragmentation does not.) Thus, a FAT file ID can change over time. Renaming a file in the FAT file system can also change the file ID, but only if the new file name is longer than the old one." Because of this, on FAT systems the file index for directories is zero. Note that in the Windows FileId API Library, the file index is named FileId.

The FileIndex consists of two parts: the low 48 bits are the socalled file reference number and contain the actual index in the MFT; the high 16 bits are the socalled sequence number: each time an entry in the MFT is reused for another file, the sequence number is increased by one. This behavior of the sequence number can be observed by creating a file, printing its FileIndex , deleting it, creating a new file and printing its FileIndex ; the FileIndex of the newest file is equal to that of the first file, with the sequence number, in the left most part of the FileIndex , increased by one. So the file reference number appears to be the equivalent of the Unix inode.

Linux-NTFS has some documentation about NTFS as well as some programs that can be used to investigate the MFT and which show the described behavior of the FileIndex . For example, the docs say, and the programs confirm this, that the root directory of a volume always has a file reference number of 5, because that is its index in the MFT.

An inode number for regular files, and for directories on WinNT, might be created as follows:

#include <sys/stat.h>
#include <io.h>
#include <stdint.h>
#include <windows.h>
#define LODWORD(l) ((DWORD)((DWORDLONG)(l)))
#define HIDWORD(l) ((DWORD)(((DWORDLONG)(l)>>32)&0xFFFFFFFF))
#define MAKEDWORDLONG(a,b) ((DWORDLONG)(((DWORD)(a))|(((DWORDLONG)((DWORD)(b)))<<32)))

#define INOSIZE (8*sizeof(ino_t))
#define SEQNUMSIZE (16)

ino_t getino (char *path)
{
BY_HANDLE_FILE_INFORMATION FileInformation;
HANDLE hFile;
uint64_t ino64, refnum;
ino_t ino;
if (!path || !*path) /* path = NULL */
    return 0;
if (access (path, F_OK)) /* path does not exist */
    return -1;
/* obtain handle to "path"; FILE_FLAG_BACKUP_SEMANTICS is used to open directories */
hFile = CreateFile (path, 0, 0, NULL, OPEN_EXISTING,
        FILE_FLAG_BACKUP_SEMANTICS | FILE_ATTRIBUTE_READONLY,
        NULL);
if (hFile == INVALID_HANDLE_VALUE) /* file cannot be opened */
    return 0;
ZeroMemory (&FileInformation, sizeof(FileInformation));
if (!GetFileInformationByHandle (hFile, &FileInformation)) { /* cannot obtain FileInformation */
    CloseHandle (hFile);
    return 0;
}
ino64 = (uint64_t) MAKEDWORDLONG (
    FileInformation.nFileIndexLow, FileInformation.nFileIndexHigh);
refnum = ino64 & ((~(0ULL)) >> SEQNUMSIZE); /* strip sequence number */
/* transform 64-bits ino into 16-bits by hashing */
ino = (ino_t) (
        ( (LODWORD(refnum)) ^ ((LODWORD(refnum)) >> INOSIZE) )
    ^
        ( (HIDWORD(refnum)) ^ ((HIDWORD(refnum)) >> INOSIZE) )
    );
CloseHandle (hFile);
return ino;
}

An inode for fstat can be implemented similarly, by obtaining the handle
from the file descriptor:

/* obtain handle to file descriptor "fd" */
hFile = _get_osfhandle (fd);

Do not close the handle after obtaining the FileInformation, since otherwise fd will also be closed.

For directories on Win9x and for network files, one might use a hashed value of the full path of the file.

Cross compilation

For cross-compiling on a Linux system, see Volker Grabsch's cross-compiling pages.