organization | general topics

building software | generating documents | data workflows

large projects | setup targets | portable projects | shell scripts | references

A project is a directory of source files under version control with a build file at the root. The build file is used to generate target files which are not under version control.

A build file is an automation tool. As a side effect it documents how to use a project.

One has a choice of build tools and build file formats. Developers are well-served by being proficient with the build tool make, and in particular GNU make.

make is an appropriate build tool when:

  • The artifacts to be built and their prerequisites are local files.
  • The tools which create target files can be invoked from the command line.
  • The build system does not need to be highly portable.

ORGANIZATION

I organize my Makefiles into 4 sections, separated by empty lines:

includes

In a small project includes are not needed and the includes section is absent.

Putting includes before the prologue makes the prologue a file local declaration. {{make}} even prevents the {{--warn-undefined-variables flag from being added multiple times to MAKEFLAGS. This is a property of the special variable MAKEFLAGS and not the += operator.

An include to bring in automatically generated header dependencies should be put in the body because it depends on the variable which contains the list of sources.

prologue

I put the following boilerplate at the top of my makefiles:

MAKEFLAGS += --warn-undefined-variables
SHELL := bash
.SHELLFLAGS := -eu -o pipefail -c
.DEFAULT_GOAL := all
.DELETE_ON_ERROR:
.SUFFIXES:

I ask make to warn me when I use an undefined variable so I catch misspelled variables. If I need to use an environment variable or a variable in an included makefile that might not be defined, I set it to empty with the conditional assignment ?= operator.

The .SHELLFLAGS variable was introduced with GNU Make 3.82. It has no effect on the version of make installed on Mac OS X, which is GNU Make 3.81.

I set the shell to bash so I can use the pipefail option. I set the pipefail option so if any of the commands in a pipeline fail, the entire pipeline fails. Otherwise the return value of the pipeline is the return value of the last command. Without this precaution, make would think the following pipeline succeeded:

cat /not_a_file | wc

The -e flag causes bash with qualifications to exit immediately if a command it executes fails. It is not strictly speaking necessary, since make executes a separate shell for each line of recipe and stops if any of them fail. I use it to be consistent with our shell script prologue. The shell scripts section describes our shell script prologue and the circumstances under which bash -e exits.

The -u flag causes bash to exit with an error message if a variable is accessed without being defined.

The -c flag is in the default value of .SHELLFLAGS and we must preserve it, because it is how make passes the script to be executed to bash.

The default target can be declared by making it the prerequisite of the .DEFAULT_GOAL target. Otherwise the default target is the first target in the makefile that doesn't start with a period. I prefer.DEFAULT_GOAL so targets can be listed in the makefile in the order they execute. Using all as the default target is a GNU convention. I always use the same name for our default target so the prologue section is always the same.

I set .DELETE_ON_ERROR so that a target is removed if its recipe fails. This prevents me from re-running make and using an incomplete or invalid target. When debugging it may be necessary to comment this line out so the incomplete or invalid target can be inspected. Also be aware that make will not delete a directory if the target that creates it fails.

I set .SUFFIXES to nothing because I prefer to define rules explicitly.

environment variables

Variables inherited from the environment should be all-caps. This style is used in the GNU Make Manual. There are some variables special to make which are also in all-caps.

I don't put empty lines in between variable declarations.

Environment variables should be declared with conditional assignment ?= operator. The value after the operator is the value that is used if the environment variable is not set. If the default value is the empty string, then make allows the declaration to be omitted, but I still declare the environment variable so all environment variables used by the makefile are documented.

If an environment variable is required, the makefile should throw an error when it is undefined:

ifeq ($(PASSWORD),)
$(error PASSWORD not set)
endif

The above code requires that the environment variable be defined to run any of the targets in the makefile. If only one target used the environment variable, the check can be performed in the recipe:

echo_env_foo:
        if [ -z $(ENV_FOO) ]; then echo 'ENV_FOO not defined'; false; fi
        echo $(ENV_FOO)

body

The last, and usually longest, section of a makefile contains target and rule declarations, as well as any variable declarations not in the special and environment variable section.

internal variables | rules and targets | phony targets | intermediate targets | declaration order | target names

internal variables

Variables which are not special to make or inherited from the environment should be in lowercase.

I declare variables with the immediate := assignment operator instead of the delayed evaluation = assignment operator. The only use I have found for the delayed evaluation assignment operator is with the wildcard variable function to pick up file names that are created during execution of the makefile. However, I don't use variables defined using delayed evaluation in target and prerequisite lists since make must evaluate these before any recipes are executed to build the dependency graph.

I don't put empty lines in between variable declarations.

By default whitespace is trimmed from the right and left side of a literal value when it is assigned to a variable. Here is how to prevent it:

empty :=
space := $(empty) $(empty)

Commas can cause a problem in variable functions. Here is how to use a comma in a patsubst argument:

comma := ,
rcs_archive := foo.c,v
src := $(patsubst %$(comma)v,%,$(rcs_archive))

rules and targets

A rule or target declaration should be set off by empty lines from any declarations before or after it. The exception is when a target is being declared phony or intermediate, as described below.

make issues a warning if a target recipe is redefined.

Prerequisites for a target can be declared in multiple places, however, and the make will use the union. This is useful with the special targets .PHONY and .INTERMEDIATE. It can be used to prevent long lines when a targets has lots of prerequisites. It can be used with the include directive; for example each included makefile could make a target which makes a target for testing code a prerequisite of the test target in the main makefile.

If more than one rule can apply to a goal, then the recipe of the first pattern that matches will be executed. If the pattern is exactly the same, it overwrites the previous rule, however. Try invoking make what.ever on this makefile:

what.%:
	echo what

what.%:
	echo what v2

%.ever:
	echo ever

I like patterns in rules to be anchored on the right side: i.e. %.c and foo.%.c more than on the left side: i.e. foo.%. This makes rules specific to a file type when files have appropriate suffixes. I like rules where the percent sign % is set off from the rest of the pattern with periods.

A goal is a target which the user specifies on the command line: e.g. make foo. The user can specify more than one goal: e.g. make foo bar baz. Or the user can specify no goals at all, in which case make builds the targets specified in the .DEFAULT_GOAL variable. If .DEFAULT_GOAL is not defined, make executes the first target in the makefile.

I set .DEFAULT_GOAL to all and I declare all to be a phony target. This is a GNU convention.

The all target normally has all the files used by the end user as direct or indirect prerequisites. However, if a full build takes a long time, consider having the all target echo the available goals.

phony targets

Targets which don't create a file with the same name as the target are called phony targets. Other targets are sometimes called file targets. Whether a target is a phony target is a property of the recipe. make is not able to infer this property, but there is a way to explicitly declare them:

.PHONY: clean
clean:
	rm -rf build

When a phony target is declared, make will execute the recipe regardless of whether a file with the same name exists.

The following rules are recommended:

  • All phony targets should be declared by making them prerequisites of .PHONY.
  • Add each phony target as a prerequisite of .PHONY immediately before the target declaration, rather than listing all the phony targets in a single place.
  • No file targets should be prerequisites of .PHONY.
  • Phony targets should not be prerequisites of file targets.

The last rule is advised because a target with a phony target will always be executed.

See the section on phony targets with an argument for how to handle them.

intermediate targets

Declaring a file target as intermediate tells make that it can be removed when it is no longer needed. This is done by making the file a prerequisite of the .INTERMEDIATE target. I do this immediately before the target rather than declaring all intermediate files in a single place:

.INTERMEDIATE: foo.txt
foo.txt:
	echo foo > $@

bar.txt: foo.txt
	cp $< $@

It is good style to declare all file targets which which are not used by the end user as intermediate so there is less clutter in the directory. On the other hand one might not want to declare a file target as intermediate when the recipe takes a long time to execute—especially during development.

There is no need to declare targets which are generated by % pattern rules as intermediate, since make will remove them unless they are a command line target. If you would prefer that some of these files were kept, make them prerequisites of the .SECONDARY target. If the .SECONDARY target is present but has no prerequisites, all files created by pattern rules are kept.

declaration order

I put the declaration for a prerequisite of a target before the declaration of the target. This way the makefile tends to read in the order that things happen. If a set of targets are prerequisites of only one other target, put them immediately before that target.

I put variables and prerequisites before the rules and targets that use them. Variables or prerequisites which are used only once should be declared immediately before the rule or target which uses them. On the other hand, variables which are used by more than one target or rule might be profitably be collected at the top of the body in a common variables subsection.

If I have a setup target I put it first. It is usually run once as sudo and is not part of the dependency graph. I put the check and clean targets after the all target. Here is a suggested order:

  • common variables
  • setup targets
  • build targets
  • test and lint targets
  • documentation targets
  • clean and install targets

When a makefile is large, it might be desirable to divide the body into sections in which all of the variables and rules share a common prefix.

How declarations work. No warning if a variable is redefined (unlike targets). Maybe this is for includes, or maybe this is so the makefile writer can redefine predefined makefile variables such as SHELL and CC or variables inherited from the environment. It can cause bugs in a large makefile.

foo := 3

echo.foo:
	echo foo

echo.foo:
	echo $(foo)

foo := 4

target names

Unfortunately, {{make}} does not provide a command line option for listing the available targets like {{rake --tasks or ant -p. One must read the makefile to discover the tasks. As a consequence, one should strive to keep the makefile as easy to understand as possible. Another alternative is to see to it that all useful work is performed either directly or indirectly by a single target and to make that target the default.

A third alternative is to use standard task names. The closest thing I have found to a de facto standard are the GNU Standard Targets. Most of the GNU standard targets are only relevant to projects consisting of source code distributed in the GNU manner, however. The most generic target names are

  • all: the name of the default target
  • check: runs tests, linters, and style enforcers
  • clean: removes files created by all
  • install:
  • uninstall: undoes what install did

prefixes to group targets; suffixes to classify file types.

Use of periods, commas, underscores, and hyphens.

GENERAL TOPICS

automatic variables | whitespace | breaking long lines | comments | directory layout | making directories | recipes with multiple output files | phony targets with an argument | debugging | cleanup tasks | shell scripts

automatic variables

The automatic variables {{$<}}, {{$^}}, {{$@, and $* should be used whenever possible. Their use helps ensure that prerequisites are declared, which in turn ensures that the dependency graph isn't missing edges. They also aid maintainability; without them file names would appear in the prerequisites and be repeated one or more times in the recipe.

I list a file as a prerequisite and use {{$< to refer to it even when the file is under version control and not a target. However, I only do this when the file is a input, not an executable.

When a target has one prerequisite, I use {{$< in preference to $^.

When targets share a recipe, {{$@ refers to the target being built, not the entire list.

$* refers to the "stem" in a pattern rule; i.e. what was matched by %.

The word variable function can be used to refer to the 2nd, 3rd, and so prerequisites in isolation:

echo_prereqs: foo bar baz quux
        echo $<
        echo $(word 2,$^)
        echo $(word 3,$^)
        echo $(word 4,$^)

The lastword variable function can be used to refer to the last prerequisite:

echo_last_prereq: foo bar baz
        $(lastword $^)

The other automatic variables are less useful and should probably be regarded as cryptic.

$^ is a de-duplicated list of the prerequisites, but the original list is available in $+.

Neither $^ nor $+ contain the order only prerequisites, which are available in $|.

$? refers to the prerequisites which are newer than the target. This can be used to write a recipe which only adds components which have changed to a library.

whitespace

I separate the makefile sections with empty lines. I also separate the rules and targets that do not start with a period with empty lines.

I do not use spaces after commas in variable function invocations which use commas as argument separators because any whitespace gets included in the argument.

breaking long lines

In recipes long lines can be broken with a backslash. The continuation line should start with a tab.

I prefer to put the break after a space, but be aware that the backslash can be put in the middle of a shell word:

echo_foo:
	ec\
	ho foo

Break up the right side of a long variable declaration by using +=:

metavariables := foo bar baz
metavariables += quux wombat wumpus

The += operator will insert a space in between the two parts.

If a target has a lot of prerequisites, they can be split over multiple lines like this:

clean: clean.foo clean.bar clean.baz
clean: clean.quux clean.wombat

Or like this:

clean: clean.foo clean.bar clean.baz \
 clean.quux clean.wombat

comments

I use comments sparingly. I don't use them to mark off sections of the makefile. Although it is tempting to document a project in the makefile, I prefer to put this documentation in a separate README file.

directory layout

I prefer to put files which are created by the makefile at the root of the project directory. This is Rule 3 from Paul's Rules of Makefiles.

To remove generated files I use this task:

clean:
        rm -f [a-z]*

is this a bit dangerous?

As a consequence, we put our source code and non-generated prerequisites in subdirectories. We put subdirectories containing non-generated prerequisites in VPATH to keep filenames short.

VPATH uses a colon delimited list of paths:

VPATH := src:lib

making directories

If a subdirectory must be created, it should be an "order-only" prerequisite. This is achieved by listing it after a pipe symbol in the prerequisites:

build/foo.txt: | build
        touch $@

The reason is that the last modification time of a directory is the last time a file was added, removed, or renamed. Usually we don't want to rebuild everything in the directory when this happens.

Directory targets can share a recipe:

build data tmp:
        mkdir $@

Is this necessary to avoid mkdir -p?

build:
	mkdir $@

build/foo: | build
        mkdir $@

recipes with multiple output files

Targets can share a recipe:

foo.txt bar.txt baz.txt:
        touch $@

This is distinct from a recipe which generates multiple files. The following is incorrect:

foo.tab.cpp foo.tab.hpp: foo.ypp
        bison -d $<

In a serial build the above recipe will be called twice needlessly. In a parallel build the recipe can be called twice at the same time, corrupting the output.

There are two correct ways to do this. One uses a dummy file:

foo.tab.cpp foo.tab.hpp: parser.dummy

parser.dummy: foo.ypp
        bison -d $<
        touch $@

When the output filenames share a common stem, a pattern rule can be used instead of a dummy file:

%.tab.cpp %.tab.hpp: %.ypp
        bison -d $<

phony targets with an argument

Targets with arguments are a way to make a makefile more general and hence reusable. Pattern rules and $* can be used to implement targets with arguments:

echo.%:
        echo $*

We recommend using a period to set off the argument from the rest of the target.

Because the arguments that might be used are not known in advance, it is not possible to make these phony targets prerequisites of .PHONY. Here is a mechanism for achieving the same effect:

echo.%: FORCE
        echo $*
FORCE:

debugging

Because make echoes commands before it runs them, debugging recipes is usually trivial.

A common source of errors is due to macro substitution errors. Two common mistakes are not double escaping dollar signs in the underlying shell script and not using parens to access variables with names longer than a single character.

Otherwise debugging recipes is the same as debugging shell scripts.

Another class of problems are problems in the dependency graph. A job that should run doesn't, or a job runs when it doesn't need to.

A possible cause of dependency graph problems is that there are variables in the targets or the prerequisites, and those variables don't contain the expected values. There are opportunities for error when populating variables with values using wildcard, shell, or patsubst. Here is a generic task which can be used to inspect any variable:

inspect.%:
        @echo $($*)

We have encountered two situations which cause recipes to run when they don't need to. One is having directories as prerequisites and not declaring them as order only prerequisites. The other is misusing the shared recipe syntax for a recipe which creates multiple files.

cleanup tasks

It is standard to have a clean task.

We often see two levels of clean task. One reason is that some resources, might take a long time to download or to compile.

In GNU projects which use autoconf, the distclean target will remove files created by configure but clean will not.

Another convention is for clobber to remove everything that is not under version control, and for clean to remove a subset of the files removed by clobber per the judgement of the makefile author. Files which might not be removed by clean are the final output of the workflow or files which are expensive to generate.

BUILDING SOFTWARE

repositories | testing | installing | header files

repositories

why they must be a separate setup target and not part of the regular build DAG

testing

The GNU standard stipulates check as the standard target for running tests. We often have a separate test target for running tests, and use check as a target which runs test as well as targets for linters and style enforcers.

We often separate test.unit and test.harness targets. The former runs traditional xUnit style tests; the latter might require that services are running and might take longer to run than test.unit. check usually does not run test.harness.

installing

What about the install command? Don't require root to install.

Makefiles which install software are often generated using the ./configure command. This takes an optional --prefix --flag which the user can use to change from /usr/local to another location on the file system. A simple implementation of ./configure would set a prefix variable in the makefile. If is an install target but no ./configure script, use DESTDIR as an environment variable for setting the installation location.

Filesystem Hierarchy Standard.

header files

When building a language like C which has header files, don't manage the dependencies between headers and source files manually in the makefile.

In a small project, make all source files dependent on all headers. Whenever a header file changes, the entire project is recompiled. This is acceptable as long as the total build time is not very long.

sources := $(wildcard *.c)
headers := $(wildcard *.h)
objects := $(patsubst %.c,%.o,$(sources))

$(objects): $(headers)

In a large project, use gcc -M to compute a dependency file for each source file, and use sed to convert it to a format that can be included into the makefile:

sources := $(wildcard *.c)

-include $(subst .c,.d,$(sources))

%.d: %.c
	$(CC) -M $(CPPFLAGS) $< > $@.$$$$; \
	sed 's,\($*\)\.o[ :]*,\ 1.o $@ : ,g' < $@.$$$$ > $@; \
	rm -f $@.$$$$

Putting a hyphen in front of the include directive quiesces warning messages when the .d files don't exist. This technique exploits a feature of GNU make in which the argument of an include directive gets built if it is missing, provided make can find a target to build it.

GENERATING DOCUMENTS

Sometimes human readable documents are prepared by editing a plain text file in some form of markup and then running a tool on the markup file to create the final format. For example, the source format might be XML or Markdown.

Common target formats are HTML, PDF, or EPUB. It might be desirable to support multiple target formats.

%.html: %.md
    markdown $^ > $@

html:

%.pdf: %.md

pdf:

%.epub: %.md

epub:


all:

DATA WORKFLOWS

A data workflow makefile implements data processing. Data is kept in files, often in a relational format, and executables are invoked on the files, transforming them in stages to the desired output.

Data workflows are different from source code builds in that the tools they use are often newly written and hence buggy. Also data workflows are more likely to benefit from parallelization.

source files

The source files of a data workflow are data, not source code. Especially if the data is large, it may be undesirable to keep them under version control. Instead an approach is to define targets with no prerequisites which download the data files.

parallelization

In order for parallelization to work, dependencies must be declared. Jobs which are not dependent on each other must be isolated. There should not be any resources accessed by jobs which make is not aware of.

It is best if tools which are run by make accept the names of all files which they read from or write to as arguments, or if the tools read from standard input and write to standard output. The file system is then completely managed by and documented by the makefile.

Tools which read from or write to a hard-coded file name are maintenance problems when invoked by make because the path must also be hard-coded in the makefile. This violates the DRY principle.

When tools need temporary files, they should use a library which returns an unused file name. Tools with a hard-coded path for a temporary file can't be invoked in parallel.

Tools which access databases might not parallelizable. make is a poor tool for implementing a database workflow because make expects targets to be a file with a last modified timestamp. We are not aware of a good tool for managing a database workflow. Makefiles should restrict themselves to reading from databases at the start of the workflow and writing to a database at the end.

We put the onus on the user of specifying the number of simultaneous jobs when invoking make with the -j flag.

The alternative would be to hardcode a value in the MAKEFLAGS variable, but choosing a portable value is difficult and the user might not want to use all the cores.

If a makefile can run jobs in parallel, it should be documented in the README.

If -j is used without an argument, there is no limit on the number of jobs make will run in parallel.

The -l flag can be used to put an upper bound on the load average as reported by uptime that make will put on the system, but we have not experimented with it. Note that for a box with 16 cores, a load average of 16 does not suggest contention, but it does suggest contention on a box with 4 cores.

splitting large files

It is often desirable to split a large file so that the parts can be processed in parallel.

Ideally we would use split to split the file and the wildcard variable function to read the parts into a variable. Doing it this way prevents make from building the entire graph of dependencies at invocation, however. The result is that user will have to invoke make two or more times to run the entire workflow.

The alternative is to calculate the names of the files that will be created by split. Here is an example:

jobs := 5
jobs_minus_one := $(shell echo '$(jobs) - 1' | bc)
first_job := 000
last_job := $(shell printf %03d $(jobs_minus_one))
input_files := $(addprefix input.,$(shell seq -w $(first_job) $(last_job)))
count_tasks := $(patsubst input.%,count.%,$(input_files))

$(input_files): input_files.dummy

input_files.dummy: input
        split -a 3 -d -n l/$(jobs) $< input.
        touch $@

count.%: input.%
        wc $<

count_tasks: $(count_tasks)

The approach has at least 3 disadvantages: (1) it is error-prone to compute the file names that will be created by split, (2) it requires the creation of an empty dummy file, and (3) we are using flags -d and -n which are not available in all implementations of split. They are available on Ubuntu 12.04 but not Ubuntu 11.10. As of August 2013, they are not available on any version of Mac OS X.

We think the importance of implementing a workflow with a single target outweighs the disadvantages.

file names

Well chosen file names make a project easier to understand. The benefit is experienced both by a user navigating the file system and a user reading the makefile.

The best choice of names is often not apparent until the end of development, so refactoring is necessary. Using automatic variables in recipes makes renaming files easier. Furthermore variables can be defined for files which appear in multiple target declarations. As previously noted, we prefer file names to specified in the makefile and passed as arguments to executables invoked by the makefile.

File suffixes should be used to declare the format of the data in a file. Consistent use of file suffixes make it possible to define rules. We use a period to separate a suffix from the root.

The most obvious convention for file names is they should describe what is in the file. However, in a workflow with a long chain of dependencies, this naming convention can result in long file names. An alternative convention is for files to be named after the executable that produced them.

We prefer file names which match this regular expression: [a-zA-Z0-9_.-]+.

Spaces are discouraged because makefile programming is shell programming. We use underscores where spaces would occur in natural language.

We use hyphens where hyphens would occur in natural language.

We use periods when we intend to parse the name in a pattern rule. Unfortunately it is sometimes also desirable to insert periods into file namessay to encode version numbers or floating point numbers.

file formats

Debugging is easiest if each file has a well-defined format, and each tool fails with an informative error message if any of its input was not in the expected format. This approach makes it easy to find the component which is at fault.

suffix description test
utf-8 $ iconv -f utf-8 -t utf-8
csv RFC 4180
json Often one JSON object per line.
Note that whitespace that
does not occur inside strings is optional.
$ python -mjson.tool
tab We use this suffix for
tab-delimited data with no header.
We prefer a header for the documentation
it provides, but headers are inconvenient
when sorting or joining files.
tsv A header should always be
present. Tab and EOL delimited
with no method of escaping or
quoting those characters. Every
row must have the same number of
fields.

the IANA specification
xml $ xmllint FILE
_ _______________________________________ ___________________________________

A way to test whether all the rows in a tab-delimited file have the same number of fields:

$ if [ 1 -ne $(awk 'BEGIN{FS="\t"} {print NF}' FILE | sort -u | wc -l) ]
  then false
  fi

TODO: trimming whitespace in a tsv

non-local prerequisites and targets

make can be used to generate artifacts which are not local files, but this is not ideal. Consider defining a target to create a database table and insert data into it. Because there is no last modified timestamp associated with the database table that make is aware of, it does not know to update the database table when the prerequisites are newer. Furthermore, if the database table were a prerequisite of other targets, make would not know to update the targets when the database table is newer.

LARGE PROJECTS

multiple makefiles in a directory | partitioned make | shared make | recursive make | inclusive make

multiple makefiles in a directory

One way to deal with the variable name and target name collision problem is to have multiple makefiles in the directory, and to make the user choose a makefile with the -f flag each time make is invoked. This makes make unpleasant to use, however.

Alternatives are to introduce subdirectories, each with a makefile.

Another option is to keep everything in a large makefile with the variables and targets of the body grouped into sections. The variables and targets in each section share a common prefix.

partitioned make

We don't think there is much value using the include directive just to split a large makefile, even one thousands of lines long, into multiple files. The include directive performs simple text substitution like the C preprocessor #include directive; hence it does not solve the variable name or target name collision problem. Note that make gives a warning if a target is redefined, but not if a variable is redefined.

shared make

An application of the include is for makefiles to share common variables and target definitions. A project with subdirectories that contain makefiles is a good application of this. We do not put the prologue section in an included makefile.

recursive make

We avoid recursive make so the complete dependency graph is available to a single invocation of make.

describe how to do it

describe the drawbacks

Perhaps it is okay when subdirectories are loosely related. For example when a small project with a pre-existing Makefile is incorporated into a project.

inclusive make

as described by Peter Miller

Root makefile includes information from subdirectories.

What about ARG_MAX.

SETUP TARGETS

Setup targets perform actions such as:

  • installing host packages: e.g. apt-get, yum, port, brew, ...
  • installing language packages: e.g. pip, gem

Two guiding principles here are (1) we don't want the makefile to contain file targets which are outside of the project directory, and (2) we don't want to write make recipes which prompt the end user for information such as a password.

Ideally, we install packages inside the project without using elevated permissions. This way the project is insulated from other projects on the machine. Other build targets can have the setup target as a prerequisite. The build target can be the directory inside the project in which the packages were installed. Alternatively, we can touch a dummy file at the end of the setup recipe.

If elevated permissions are required, the setup task should be a phony target. The recipe should not invoke sudo. Instead the task should be invoked by the end user with the correct permissions, i.e.:

$ sudo make setup

One advantage of this approach is that it gives the end user some flexibility. The user can use virtualenv and install pip packages as a regular user, or use sudo and install pip packages as root.

A disadvantage of a phony setup task is it cannot be a prerequisite of other build tasks. The end user must invoke the setup task separately. If ease-of-use is critical, test for the presence of necessary packages:

ifeq ($(shell which xmllint),)
$(error run "make setup" to install xmllint)
endif

ifeq ($(shell python -c 'import jinja2; print("ok")'),)
$(error run "make setup" to install python jinja2 lib)
endif

The end user might want to install system packages as root and language packages as an unprivileged user. There should be separate targets for each.

PORTABLE PROJECTS

The autoconf manual is about twice as long as the make manual. autoconf is perhaps evidence that make is not a good solution for portable builds. Perhaps portable builds are always difficult. Making definitive statements involves evaluating other build systems, and is out of scope of this document.

Even if it is decided that a project is only going to target one architecture and doesn't need to be highly portable, it is still worthwhile to write makefiles in a portable way. Machines and user environments are rarely configured identically. There have been cases where only one developer was able to build the project.

Here are some things to think about.

  • make version
  • shell version
  • external commands
  • environment variables.
  • paths outside the project directory

make version

I always use GNU Make. The choice of make is made by the person invoking make and not the makefile author, however. The makefile author can inspect the variable MAKE_VERSION if the version of GNU Make is critical. This only gives a major and minor version number. Here is code which tests for GNU Make:

VERSION := $(shell $(MAKE) --version)
ifneq ($(firstword $(VERSION)),GNU)
$(error Use GNU Make)
endif

shell version

The shell I use is Bash. In particular I assume that the first executable named bash in PATH is the Bash shell. The version of bash that is distributed with Mac OS X is quite old: version 3.2 circa 2006. Personally I install a current version of Bash, but most Mac users probably don't.

The default Make shell is /bin/sh. For maximum portability, one should use this as the shell and not use any features that are not listed in the POSIX standard. FreeBSD does not come with Bash installed by default, and FreeBSD system which have Bash do not install it at /bin/bash. To verify you are not using Bash specific features run the script or recipe with dash.

external commands

Shell scripts can fail because external commands are missing. Even when run on the same system, the script may fail because it was run by a user with a different PATH.

The GNU Coding Standards prohibit using any external commands except for the following:

awk cat cmp cp diff echo egrep expr false grep install-info ln ls
mkdir mv printf pwd rm rmdir sed sleep sort tar test touch tr true

Even if the external command is present, the option might not be. The following options are available on recent versions of Linux but not Mac OS X:

  • sort -R: randomly shuffles input
  • grep -P: grep using a Perl-style regular expression
  • split -n l/N: splits input into N files
  • {{du --max-depth=1: show disk usage

Here are the POSIX mandated options.

External commands which are not reliably present should be installed in the bin subdirectory of the project and invoke them from there in the Makefile. Such external commands should be implemented as shell scripts, or a widely available scripting language such as Perl or Python.

What about defining all external utilities (i.e. outside of the repository) in one place in shell variables and invoking them via the variables? This makes an audit of the script easy.

environment variables

  • COLUMNS
  • HOME
  • LANG
  • LINES
  • LOGNAME
  • PATH
  • PWD
  • SHELL
  • TERM

bash doesn't read any startup files, does it?

Shell scripts can't parse any of the commonly used configuration file formats. Write a utility to parse a configuration file format and then write a shell script that can be sourced? One could write a "shell only" configuration file format which is a bunch of variable assignments or exports which is intended to be sourced. Really, the way to configure shell scripts is by environment variables.

paths outside the project directory

  • installing files
  • /tmp directories
  • getting the containing directory of a makefile or a shell script (MAKELIST)

SHELL SCRIPTS

Makefile recipes are in effect shell scripts. At times it may be beneficial to move code out of a recipe and into a separate shell script.

Extracting a shell script from a recipe is easier when the recipe has few Makefile variables.

The shell script should not know about file names. Ideally, the shell script should read from standard input and write to standard output. The dependency DAG is in the makefile. Yes, this means sometimes passing lots of arguments to the script. If you need to move or rename files in the directory, you can do it all in the Makefile.

tmp files.

Extracted shell scripts should be kept in the project directory under source control. If a subdirectory is desired for extracted shell scripts, bin is a good choice.

There is a tool called shellcheck which finds a lot of errors and risky idioms. It is available in package managers such as brew and apt.

The shell script should have a prologue to make its behavior agree with the the SHELL and .SHELLFLAGS variables in the prologue of the Makefile:

#!/usr/bin/env bash

set -eu -o pipefail

Here is an except from the bash man page describing the -e flag:

Exit immediately if a simple command exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test in an if statement, part of a && or || list, or if the command's return value is being inverted via !. A trap on ERR, if set, is executed before the shell exits.

Some commands exit with a non-zero status in conditions which should not always be treated as errors. For example, grep when no lines match, and diff when the files being compared are different. The || true idiom handles this:

# remove comment lines, if any:
grep -v '^#' foo.py > bar.py || true

The -u flag causes bash to treat unset variables as an error when encountered in parameter expansion. To source a file which references unset variables:

set +u
source ./common.sh
set -u

Use "$@" to pass the command line or function parameters to a command when you want the command to get the same number of parameters. If you want to combine the command line or functions parameters to a single parameter, use "$*". $@ and $* will expand to at least the number of parameters that the shell or function received. If any of the parameters contained whitespace the command will receive more parameters than the shell or function did.

Filenames which contains spaces or which start with hyphens present hazards when shell scripting. In a makefile project they are avoided by renaming any files with external provenance as soon as they are acquired. When iterating over files with for, do not use command substitution to generate the list of files. Use the built in shell globbing operator * instead. When passing a list files generated by a fileglob, the double hyphen will prevent any files with hyphens at the start from being interpreted as flags:

cat -- * > all.txt

The above will work with file names with spaces. The code gives another example:

cp -- "$source" "$target"
  • use $( ) instead of ` `
  • double quote all variables in [ ]; or use [[ ]] instead of [ ]
  • use readonly and local

Bash scripts should not depend on the working directory of the invoker. If the bash script calls other executables in the same directory, here is a reliable way to get that directory:

bin_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

An example of how to write an error message:

msg='the export failed'
echo "[ERROR] $(date) $0: $msg" >&2

Here is an example of how to perform cleanup after an error condition:

function cleanup {
    rm -f "$tmp_dir"
}

trap cleanup ERR

trap is used to register a signal handler. The ERR condition is a pseudo-signal which fires when a command fails. It fires in the same situations as when a command failure would cause the shell to exit running under the -e option.

The script shellcheck looks for errors in shell scripts. A package of the same name exists in both Homebrew and Apt:

    $ apt install shellcheck

    $ shellcheck foo.sh

Comments can be used to prevent certain checks on certain code:

# shellcheck disable=SC2016
echo 'Modifying $PATH'

REFERENCES