Bisect is a code coverage tool for the OCaml language1. Its name stems from the following acronym: Bisect is an Insanely Short-sized and Elementary Coverage Tool. The shortness of the source files can be seen as a tribute to the camlp4 tool and API bundled with the standard OCaml distribution. Over the time, features have been added and the source code is not so small anymore; however, the core functionality of Bisect is implemented only a few hundreds of lines.
Code coverage is a mean of software testing. Associated with unit or functional testing, the goal of code coverage is to measure the portion of the application source code that has actually been exercised by tests. To achieve this goal, the code coverage tool defines points in the source code and memorizes at runtime (that is, when tests are run) if the execution path of the program passes at these points. The so-called points are places of interest in the source code (as an example, the branches of an if or match construct are interesting points), to ensure that all alternatives have been tested. In practice, code coverage is often performed in three steps:
Bisect can be seen as an improved version of the ocamlcp/ocamlprof2 couple (both of these tools being part of the standard OCaml distribution). In this respect, Bisect performs statement and condition coverage, but not path coverage. This means that it only counts how many times the application passed at each point, independently of which was the statement previously executed (that is, the previously visited point). At the opposite, path coverage is not only interested in points but also in paths, the goal being to ensure that every possible execution path has been followed.
Code coverage is a useful software metric but, being based on tests, it cannot ensure that a program is correct. It only gives hints about where to look for untested, possibly dead, code. For program correction, one should consider more involved tools and formalisms such as model checking, or proof systems. Code coverage is still convenient in practice because it is a much simpler method that require no particular knowledge from the developer. Bisect provides several output modes (ranging from bare text to Jenkins3-compatible xml) in order to allow easy integration with an existing toolchain.
Bisect is released under the GPL version 3. This licensing scheme should not cause any problem, as instrumented applications are intended to be used during development but should not be released publicly. The GPL contamination has thus no consequence here.
In order to improve the project, I am primarily looking for testers and bug reporters. Pointing errors in documentation and indicating where it should be enhanced is also very helpful.
Bugs and feature requests can be made at http://bugs.x9c.fr.
Other requests can be sent to firstname.lastname@example.org.
Before starting to build Bisect, one first has to check that dependencies are already installed. The following elements are needed in order to build Bisect:
The configuration of Argot is done by executing ./configure. One can specify elements if they are not correctly inferred by the configure script; the following switches are available:
Moreover, two command-line switches allow to choose which version(s) of the instrumenter should be built:
The Java2 version will be built only if the ocamljava3 compiler is present and located by the makefile. The syntax extension will be compiled only to bytecode.
The actual build of Bisect is launched by executing make all. When build is finished, it is possible to run some simple tests by running make tests. Documentation can be generated by running make doc.
Bisect is installed by executing make install. According to local settings, it may be necessary to acquire privileged accesses, running for example sudo make install. The actual installation directory depends on the use of ocamlfind: if present the files are placed inside the Findlib hierarchy, otherwise they are placed in the directory ‘ocamlc -where‘/bisect (i. e. $PREFIX/lib/ocaml/bisect).
As previously stated, using a code coverage tool usually requires to follow three steps: instrumentation, execution, and report. Bisect is no exception in this respect; the following sections discuss each of these three steps.
Bisect instruments the application at compile-time using either a camlp4- or a ppx-based preprocessor. Relying on preprocessors allows the user to choose exactly which module (i. e. source file) of the application should be instrumented. Code samples 3.1 and 3.2 show how to instrument a file named source.ml during compilation (the very same effect can be achieved using either ocamlopt or ocamljava as a replacement of ocamlc). Code sample 3.3 does the same through ocamlfind. During this step, Bisect will produce a file named source.cmp1. Files with the cmp extension contain point information for a given source file, that is: identifiers, positions, and kinds of points. Of course, the usual cmi, cmo, cmx, and cmj files are also produced, depending on the compiler actually invoked. It is necessary to pass the -I +bisect option to the compiler because instrumentation adds calls to functions defined in the runtime modules of Bisect.
ocamlc -c -I +bisect -pp 'camlp4o str.cma /path/to/bisect_pp.cmo' source.ml
ocamlc -c -I +bisect -ppx '/path/to/bisect_ppx.byte' source.ml
ocamlfind ocamlc -package bisect -syntax camlp4o -c source.ml
Note: the use of camlp4o implies that the OCaml grammar is slightly modified. Most notably, camlp4o enables quotation by default. Practically, this means that characters sequences such as
>> now delimit quotations. This mechanism can be disabled by passing the -no_quot command-line switch to camlp4o.
Since version 1.1, it is possible to select an instrumentation mode through the -mode command-line switch followed by one these values:
It is possible to choose which language constructs should be instrumented by passing -enable and/or -disable command-line switches to either bisect_pp.cmo or bisect_pp.byte. Both switches are followed by a string describing the kinds of points the user wants to either enable or disable. The possible characters are:
By default, all point kinds are enabled. As an example, -disable cdev will disable instrumentation of all class constructs.
Since version 1.1, the -exclude command-line switch allows to exclude top-level values from instrumentation. It should be followed by a comma-separated list of patterns2. Any top-level function matching one of the patterns will not be instrumented.
Since version 1.2, the -exclude-file command-line switch allows to exclude top-level values whose list is stored in a file. The contents of the file should respect the following grammar:
|contents ::= file_list|
|file_list ::= file_list file | є|
|file ::= file string [ exclusion_list ] opt_separator|
|opt_separator ::= ; | є|
|exclusion_list ::= exclusion_list exclusion | є|
|exclusion ::= name string opt_separator | regexp string opt_separator|
Since version 1.1, it is also possible to use special comments in order to precisely control instrumentation on a code area basis. The following comments are recognized:
When compiling in unsafe mode4, the -unsafe switch should be passed to camlp4 instead of the compiler. Indeed, as camlp4 is building a syntax tree that is passed to the compiler, issuing the -unsafe switch to the compiler has no effect because it is too late: the code has been built by camlp4 in safe mode. In such a case, the compiler warns the user with the following message: Warning: option -unsafe used with a preprocessor returning a syntax tree. The correct command-line invocations are shown by code samples 3.4 and 3.5.
ocamlc -c -I +bisect \ -pp 'camlp4o str.cma -unsafe /path/to/bisect_pp.cmo' \ source.ml
ocamlfind ocamlc -package bisect -syntax camlp4o -ppopt -unsafe -c source.ml
Linking a program containing instrumented modules is not different from classical linking, except that one should link the Bisect library to the produced executable. This is usually done by adding one of the following to the linking command-line:
In order to use Bisect in multithread applications, it is necessary to also link with the BisectThread module. This also implies to pass the -linkall option to the compiler.
Running an instrumented application is not different from running any application compiled with an OCaml compiler. However, Bisect will produce runtime data in a file each time the application is run. A new file will be created at each invocation, the first one being bisect0001.out, the second one bisect0002.out, and so on. It is also possible to define the scheme used for file names by setting the BISECT_FILE environment variable. If BISECT_FILE is equal to file, files will be named filen.out where n is a natural number value padded with zeroes to 4 digits (i. e. “0001”, “0002”, and so on).
Bisect can also be parametrized using another environment variable: BISECT_SILENT. If this variable is set to either “YES” or “ON” (defaulting to “OFF”, case being ignored), then Bisect will not output any message at runtime. If not silent, Bisect will output a message on the standard error in various situations such as:
In order to generate the coverage report for the instrumented application, it is sufficient to invoke the bisect-report executable (alternatively either bisect-report.opt, or bisect-report.jar). This program recognizes the following command-line switches:
Wherever a destination file is waited, the use of - (i. e. minus sign) is interpreted as the standard output. The user should also provide on the command-line the list of the runtime data files that should be used to produce the report. As a result, a typical invocation is: bisect-report -html report bisect*.out to process all data files in the current directory and generate an html report into the report directory.
If relative file paths are used at the instrumentation step, the report executable should be launched from the same directory. Another option is of course to use absolute paths. Using absolute paths is also useful when playing with the -pack option. Indeed, it is possible in this case to have several source files with the same name in different directories and packed to different enclosing modules. In the case of packed modules, absolute paths allows to avoid ambiguities but are not necessary. It is in fact sufficient to have discriminating paths, that is: paths that always allow to distinguish files packed in different enclosing modules. It is also possible to use the -I command-line switch to specify a search path for source files.
When the html output mode is chosen, a bunch a files is produced: one index.html file, and one html file per instrumented module. The index.html file provides application-wide statistics about coverage, as well as links to the other files. The module files provide module-wide statistics, as well as a duplicate of the module source, enhanced with point information. Points are represented in the source as special comments having the form (*[n]*) where n indicates how many times the point was passed at runtime. For easier appreciation, colors are also used to annotate source lines:
When another output mode is chosen, only one file is produced (or none, if - is used) containing the whole coverage information. The appendix details the various file formats.
Since version 1.2, it is possible to perform some computation on data files. The aforementioned command-line bisect-report -html report bisect*.out combines the data of all files matching the bisect*.out pattern, but it may be useful to specify how data should be combined. This is done through the -combine-expr command-line switch that should be followed by an expression. Using this switch is intended to replace the list of files to process, leading to the command-line bisect-report -html report -combine-expr ’expr’.
The expression should be well-formed according to the following grammar:
|expr ::= expr binop expr | ( expr ) | func_name ( expr ) | value|
|binop ::= + | - | * | /|
|func_name ::= sum | nonnull|
|value ::= single_file | file_set | integer|
Using -combine-expr permits sophisticated analysis of program runs, thus allowing fine-grained debugging. Suppose that you are able to produce two runs of a program, one exhibiting a bug and the other one not exhibiting it. The expression
will produce a report where;
It then far easier to spot the area where the bug stems from.
Code sample 4.1 shows the makefile used for the compilation (with instrumentation), run, and report phases for a one-file application: source.ml. Code sample 4.2 shows the same information when relying on ocamlfind.
default: clean compile run report clean: rm -fr report rm -f *.cm* *.out bytecode compile: ocamlc -c -I +bisect \ -pp "camlp4o str.cma `ocamlc -where`/bisect/bisect_pp.cmo" source.ml ocamlc -o bytecode -I +bisect bisect.cma source.cmo run: BISECT_FILE=coverage ./bytecode report: bisect-report -dump - -html report coverage*.out
default: clean compile run report clean: rm -fr report rm -f *.cm* *.out bytecode compile: ocamlfind ocamlc \ -package bisect -linkpkg -syntax camlp4o -o bytecode source.ml run: BISECT_FILE=coverage ./bytecode report: ocamlfind bisect/bisect-report -html report coverage*.out
It is also possible to compile the source.ml file through the ocamlbuild tool. The most convenient way is to first define a new bisect tag in a myocamlbuild.ml plugin. This tag will add the necessary elements when compiling or linking a file using the Bisect features, as shown by code sample 4.3. Then, it is sufficient to use the newly introduced tag in the _tags file to use bisect, as shown by code sample 4.4.
open Ocamlbuild_plugin open Ocamlbuild_pack let () = dispatch begin function | After_rules -> flag ["bisect"; "pp"] (S [A"camlp4o"; A"str.cma"; A"/path/to/bisect/bisect_pp.cmo"]); flag ["bisect"; "compile"] (S [A"-I"; A"/path/to/bisect"]); flag ["bisect"; "link"; "byte"] (S [A"-I"; A"/path/to/bisect"; A"bisect.cma"]); flag ["bisect"; "link"; "native"] (S [A"-I"; A"/path/to/bisect"; A"bisect.cmxa"]); flag ["bisect"; "link"; "java"] (S [A"-I"; A"/path/to/bisect"; A"bisect.cmja"]) | _ -> () end
Finally, ocamlbuild can also leverage ocamlfind, leading to the following command-line invocation: ocamlbuild -use-ocamlfind -tag ’package(bisect)’ -tag ’syntax(camlp4o)’ -tag ’syntax(bisect_pp)’ source.byte.
Bisect suffers from the following issues:
The csv mode outputs statistics line by line: first for the whole application, and then for each file. Each line has the following format: first the path of the source file (- being used for the overall application), then 14 × 2 integer values (13 for the various point kinds, plus one for the total). Each integer couple consists, for each point kind, of (i) the number of visited points and (ii) the total number of points. The point kinds are output in the following order: let bindings, sequence, for loops, if/then constructs, try/with constructs, while loops, match/function constructs, class expressions, class initializers, class methods, class values, top level expressions, lazy operators. Code sample A.1 shows such an output.
The text mode outputs statistics first for the overall application, and then for each file. The statistics always take the same form, that is the ratio number of visited points over total number of points for each point kind, followed by the ratio for all point kinds. Code sample A.2 shows such an output.
Summary: - 'binding' points: 3/3 (100.00 %) - 'sequence' points: 5/5 (100.00 %) - 'for' points: 1/1 (100.00 %) - 'if/then' points: none - 'try' points: none - 'while' points: none - 'match/function' points: 2/2 (100.00 %) - 'class expression' points: none - 'class initializer' points: none - 'class method' points: none - 'class value' points: none - 'toplevel expression' points: none - 'lazy operator' points: 2/2 (100.00 %) - total: 13/13 (100.00 %) File 'source.ml': - 'binding' points: 3/3 (100.00 %) - 'sequence' points: 5/5 (100.00 %) - 'for' points: 1/1 (100.00 %) - 'if/then' points: none - 'try' points: none - 'while' points: none - 'match/function' points: 2/2 (100.00 %) - 'class expression' points: none - 'class initializer' points: none - 'class method' points: none - 'class value' points: none - 'toplevel expression' points: none - 'lazy operator' points: 2/2 (100.00 %) - total: 13/13 (100.00 %)
The xml mode outputs both statistics and information for each of the points in the source files. Code sample A.3 shows the dtd for produced xml files (it can be generated using the -dump-dtd command-line option). Statistics are output for the whole application and for each file inside <summary elements, while information relative to each point is encoded into <point elements. Code sample A.4 shows an xml output.
<!ELEMENT bisect-report (summary,file*)> <!ELEMENT file (summary,point*)> <!ATTLIST file path CDATA #REQUIRED> <!ELEMENT summary (element*)> <!ELEMENT element EMPTY> <!ATTLIST element kind CDATA #REQUIRED> <!ATTLIST element count CDATA #REQUIRED> <!ATTLIST element total CDATA #REQUIRED> <!ELEMENT point EMPTY> <!ATTLIST point offset CDATA #REQUIRED> <!ATTLIST point count CDATA #REQUIRED> <!ATTLIST point kind CDATA #REQUIRED>
<?xml version="1.0" encoding="iso-8859-1"?> <bisect-report> <summary> <element kind="binding" count="1" total="1"/> <element kind="sequence" count="0" total="0"/> <element kind="for" count="0" total="0"/> <element kind="if/then" count="0" total="0"/> <element kind="try" count="0" total="0"/> <element kind="while" count="0" total="0"/> <element kind="match/function" count="0" total="0"/> <element kind="class expression" count="0" total="0"/> <element kind="class initializer" count="0" total="0"/> <element kind="class method" count="0" total="0"/> <element kind="class value" count="0" total="0"/> <element kind="toplevel expression" count="0" total="0"/> <element kind="lazy operator" count="0" total="0"/> <element kind="total" count="1" total="1"/> </summary> <file path="source.ml"> <summary> <element kind="binding" count="1" total="1"/> <element kind="sequence" count="0" total="0"/> <element kind="for" count="0" total="0"/> <element kind="if/then" count="0" total="0"/> <element kind="try" count="0" total="0"/> <element kind="while" count="0" total="0"/> <element kind="match/function" count="0" total="0"/> <element kind="class expression" count="0" total="0"/> <element kind="class initializer" count="0" total="0"/> <element kind="class method" count="0" total="0"/> <element kind="class value" count="0" total="0"/> <element kind="toplevel expression" count="0" total="0"/> <element kind="lazy operator" count="0" total="0"/> <element kind="total" count="1" total="1"/> </summary> <point offset="11" count="1" kind="binding"/> </file> </bisect-report>
This mode outputs only overall statistics, in a format that is compatible with EMMA1. This compatibility allows to use Bisect output in tools that provide support for EMMA, notably giving an easy way to use Bisect with continuous integration servers like Jenkins.
EMMA defines only four categories for coverage: classes, methods, blocks, and lines. Bisect defining more point kinds, the following mapping is used:
Another element should be noted regarding this output mode: for all the categories, any 0/0 value is replaced by a 1/1 value. This replacement is justified by the fact that 0/0 may result in 0% while 1/1 results in 100%, and one would not want to have a build failure in Jenkins due to low coverage. Code sample A.5 shows an EMMA-compatible xml output.
<?xml version="1.0" encoding="iso-8859-1"?> <report> <stats> <packages value="1"/> <classes value="1"/> <methods value="1"/> <srcfiles value="1"/> <srclines value="1"/> </stats> <data> <all name="all classes"> <coverage type="class, %" value="100% (1/1)"/> <coverage type="method, %" value="100% (1/1)"/> <coverage type="block, %" value="100% (1/1)"/> <coverage type="line, %" value="100% (1/1)"/> </all> </data> </report>
The dump format is mainly used for debugging, only displaying the various points and their associated counts for each file. Code sample A.6 shows such a dump.
file "source.ml" point sequence at offset 17: 1 point sequence at offset 42: 1 point for at offset 64: 5 point sequence at offset 118: 1 point sequence at offset 144: 1 point for at offset 166: 3 point match/function at offset 253: 1 point match/function at offset 278: 1 point match/function at offset 297: 0
This document was translated from LATEX by HEVEA.