Making legacy code unit-testable

This guide describes how to make legacy code testable and how to test it. The following techniques do not require any API change: they concern testing each single module as a unit without touching modules depending on it. These techniques are applicable to legacy code: you do not need to have an exact knowledge of every implementation detail of a module to test it.

The goals are:

  • to gain confidence in a module as a developer (or even part of test-driven development)

  • to document it: tests are examples

  • to have a safety net as a maintainer (non-regression testing), e.g., when: - adding/changing features - performing various forms of refactoring (changing the API or not)

In the rest of this page, “testable” means “unit-testable”, for conciseness.

Here “legacy” is a loose term, encompassing:

  • existing code written by other developers, not necessarily containing a right amount of built-in unit tests

  • code that is only tested by heavy/slow/high level tests, like Tezt or Python tests - this is often a symptom of code that is hard to unit-test

  • tightly-coupled code (e.g. module A directly depends on functions of modules B, C and D, thus unit-testing A requires setting everything up for B, C and D too, which not only makes unit testing harder, but is also complex and brittle)

  • code that calls system functions (e.g. Unix.chmod, which requires modifying actual files on the filesystem, also making unit testing harder) or more generally any kind of side-effecting function

Solutions

Testing a module and all its dependencies

Sometimes, legacy modules depending on other modules may be “unit-tested” without modifying them at all.

This approach is technically not “unit” testing, as the module/function is not tested in isolation from its dependencies. However as the intention is close to unit testing, we include it here.

Pros:

  • No change at all in business code (not even an Internal_for_tests module)

Cons:

  • Code not tested in isolation

  • Tests are harder to write and maintain

Works best when:

  • Code and its dependencies are pure, or “pure enough” (e.g. code may contain logging and other unharmful/unimportant side effects)

  • Dependency tree is not too deep (e.g. if A depends on B which depends on C which depends on D, then it becomes hard to write/read/maintain unit tests)

Using function records

This approach decouples the business code from its dependency functions, either by storing dependency functions in the state - if any - or by taking them as additional parameters in each module function.

Note that this does not decouple types, only functions! If your module depends on a dependency type that can only be built by having side-effects, then you are forced to have those side effects too in your tests.

As an example, consider the following Counter module (signature and implementation):

module Counter : sig
  type t
  val create_from_file : unit -> t
  val increment_and_save_to_file : t -> t
end = struct
  type t = {
    counter : int;
  }

  let create_from_file () =
    let counter = int_of_string (Files.read "/tmp/counter.txt") in
    {counter}

  let increment_and_save_to_file t =
    let t = {counter = t.counter + 1} in
    Files.write "/tmp/counter.txt" (string_of_int t.counter);
    t
end

Counter is hardly unit-testable because of the calls to Files.read and Files.write which have the side effect of reading from/writing into /tmp/counter.txt file. Thus we extract Files.read and Files.write into an argument that can be changed with a dumb (“mock”) function in tests.

Store dependencies in state

One way to achieve this - when your module has a state, usually a type t - is to store all dependency functions in a field, here dependencies:

module Counter : sig
  type t
  val create_from_file : unit -> t
  val increment_and_save_to_file : t -> t

  module Internal_for_tests : sig
    type dependencies = {
      files_read : string -> string;
      files_write : string -> string -> unit;
    }
    val create_from_file : dependencies -> unit -> t
  end
end = struct
  type dependencies = {
    files_read : string -> string;
    files_write : string -> string -> unit;
  }

  type t = {
    counter : int;
    dependencies : dependencies;
  }

  let create_from_file_internal dependencies () =
    let counter = int_of_string (dependencies.files_read "/tmp/counter.txt") in
    {counter; dependencies}

  let create_from_file = create_from_file_internal {files_read = Files.read; files_write = Files.write}

  let increment_and_save_to_file t =
    let t = {t with counter = t.counter + 1} in
    t.dependencies.files_write "/tmp/counter.txt" (string_of_int t.counter);
    t

  module Internal_for_tests = struct
    type nonrec dependencies = dependencies = {
      files_read : string -> string;
      files_write : string -> string -> unit;
    }

    let create_from_file = create_from_file_internal
  end
end

Note that the direct calls to Files.read and Files.write were replaced with indirect calls to dependencies.files_read and field t.dependencies.files_write:

  • They are set to Files.[read|write] in the business constructor Counter.create_from_file

  • They are changed at will in the test constructor Counter.Internal_for_tests.create_from_file

Also note that while the API was extended with test artifacts under the Internal_for_tests sub-module, the public API is otherwise unchanged, thus keeping this refactoring local - you do not need to change any call sites!

Now we can test this module without any side effect:

let test () =
  let counter_value_written = ref "" in
  let fake_files_read file_name = "41" in
  let fake_files_write file_name text =
    counter_value_written := text
  in
  let counter = Counter.Internal_for_tests.create_from_file {files_read = fake_files_read; files_write = fake_files_write} () in
  let _ = Counter.increment_and_save_to_file counter in
  Alcotest.(check string) "counter value was incremented in file" !counter_value_written "42"

Taking dependencies in function argument

An alternative solution, more verbose but not requiring any “state” value available in each function, is to take the dependencies directly as an additional function argument:

module Counter : sig
  type t
  val create_from_file : unit -> t
  val increment_and_save_to_file : t -> t

  module Internal_for_tests : sig
    type dependencies = {
      files_read : string -> string;
      files_write : string -> string -> unit;
    }

    val create_from_file : dependencies -> unit -> t
    val increment_and_save_to_file : dependencies -> t -> t
  end
end = struct
  type dependencies = {
    files_read : string -> string;
    files_write : string -> string -> unit;
  }

  type t = {
    counter : int;
  }

  let business_dependencies = {
    files_read = Files.read;
    files_write = Files.write;
  }

  let create_from_file_internal dependencies counter =
    let counter = int_of_string (dependencies.files_read "/tmp/counter.txt") in
    {counter}

  let create_from_file = create_from_file_internal business_dependencies

  let increment_and_save_to_file_internal dependencies t =
    let t = {counter = t.counter + 1} in
    dependencies.files_write "/tmp/counter.txt" (string_of_int t.counter);
    t

  let increment_and_save_to_file t = increment_and_save_to_file_internal business_dependencies t

  module Internal_for_tests = struct
    type nonrec dependencies = dependencies = {
      files_read : string -> string;
      files_write : string -> string -> unit;
    }

    let create_from_file = create_from_file_internal
    let increment_and_save_to_file = increment_and_save_to_file_internal
  end
end

Note that the direct calls to Files.read and Files.write were replaced with indirect calls to arguments dependencies.files_read and dependencies.files_write:

  • They are set to Files.[read|write] in each business function (create_from_file and increment_and_save_to_file)

  • They are changed at will in the test function Counter.Internal_for_tests.[create_from_file|increment_and_save_to_file]

As in the previous solution, notice that the public API has not changed - save for additional APIs in Internal_for_tests.

Now we can test this module without any side effect:

let test () =
  let counter_value_written = ref "" in
  let fake_files_read file_name = "41" in
  let fake_files_write file_name text =
    counter_value_written := text
  in
  let mock_dependencies = Counter.Internal_for_tests.{
    files_read = fake_files_read;
    files_write = fake_files_write;
  } in
  let counter = Counter.Internal_for_tests.create_from_file mock_dependencies () in
  let _ = Counter.Internal_for_tests.increment_and_save_to_file mock_dependencies counter in
  Alcotest.(check string) "counter value was incremented in file" !counter_value_written "42"

Pros and Cons for Function records

Works best when:

  • Dependency types are not too hard to build

Pros:

  • No side-effecting function is called (they are replaced with mocks)

  • Enables validating the arguments passed to mock functions (e.g. counter_value_written) have the right value

  • Independent of the dependency depth (for functions): if A calls B.f which calls C.g, your mock of B.f will never call C.g

Cons:

  • All dependency types remain, so if it is difficult/side-effectful to create those values, testing remains difficult/not so unitary

  • Adds a bit of boilerplate in Internal_for_tests module

  • Adds a bit of indirection, by introducing indirect calls to dependency functions. The associated performance overhead should be negligible in most practical cases. There also is a slight decrease in code readability, but documenting this unit-testability pattern should avoid many headaches.

To choose between the field and the argument:

  • If your module already has a kind of “state” (usually a type t), then add a dependencies field

  • Else add a dependencies argument - but this requires duplicating each function, which ends up being very verbose if you have several functions

  • If your “state” value (usually a value of type t) is passed to a polymorphic function like = or compare (which throw on function fields, and are famous for being an anti-pattern), and it is not possible for you to fix this anti-pattern, then either switch to function arguments, or wrap in an object.

Using functors

This approach decouples the business code from its dependency modules. Note that unlike the Function records solution, this decouples both dependency functions and abstract types!

Consider the following code: it is similar to the previous Counter example but this time, the Files dependency module (which could be another module, a third party library, or even the Stdlib) also has an abstract type t:

(* The dependency *)
module Files : sig
  type t
  val openf : string -> t
  val write : t -> string -> unit
  val close : t -> unit
  (* Many other functions and types *)
end = struct (* omitted implementation *) end

module Counter : sig
  type t
  val create : int -> t
  val increment_and_save_to_file : t -> t
end = struct
  type t = {
    counter : int;
  }

  let create counter = {counter}

  let increment_and_save_to_file t =
    let t = {counter = t.counter + 1} in
    let file = Files.openf "/tmp/counter.txt" in
    Files.write file (string_of_int t.counter);
    Files.close file;
    t
end

The technique is to transform Counter into a functor that takes a module looking like Files in argument - but which can now be changed in tests!

module Counter : sig
  module type S = sig
    type t
    val create : int -> t
    val increment_and_save_to_file : t -> t
  end

  include S

  module Internal_for_tests : sig
    module type FILES = sig
      type t
      val openf : string -> t
      val write : t -> string -> unit
      val close : t -> unit
    end
    module Make (Files : FILES) : S
  end
end = struct
  module type S = sig
    type t
    val create : int -> t
    val increment_and_save_to_file : t -> t
  end

  module type FILES = sig
    type t
    val openf : string -> t
    val write : t -> string -> unit
    val close : t -> unit
  end

  module Make (Files : FILES) = struct
    type t = {
      counter : int;
    }

    let create counter = {counter}

    let increment_and_save_to_file t =
      let t = {counter = t.counter + 1} in
      let file = Files.openf "/tmp/counter.txt" in
      Files.write file (string_of_int t.counter);
      Files.close file;
      t
  end

  (* Do not be mistaken: here [Files] refers to the real, business [Files] module! *)
  include Make (Files)

  module Internal_for_tests = struct
    module type FILES = FILES

    module Make = Make
  end
end

As you can see, this is significantly more verbose!

However, now we can freely change/mock not only the dependency functions Files.[openf|close|write], but also the implementation of type Files.t!

let test () =
  let written_content = ref "" in
  let module Counter = Counter.Internal_for_tests.Make (struct
    type t = unit
    let openf _ = ()
    let close _ = ()
    let write _ content = written_content := content
  end) in
  let counter = Counter.create 41 in
  let _ = Counter.increment_and_save_to_file counter in
  Alcotest.(check string) "counter value was incremented in file" !written_content "42"

While the real Files.t type probably contained a file descriptor, our mock module has no side effect outside of the test!

Note on verbosity: some things are duplicated because of OCaml MLI syntax, e.g. module type declaration. This can be partially mitigated by using the _intf trick but this in turn induces a bit more complexity, use with caution.

Works best when:

  • You need to decouple (abstract) types, not only functions. For example, because building values of those types adds too much complexity, or requires side-effects.

  • There are linear dependencies in modules (A depends on B which depends on C, but A does not depend on C)

Pros:

  • Everything but exposed and private dependency types are mocked

  • Enables validating the arguments passed to mock functions (e.g. counter_value_written) have the right value

  • Independent of the dependency depth (for functions): if A calls B which calls C, your mock of B will never call C nor refer to its abstract types

Cons:

  • Verbosity

  • Additional complexity (functors, module types)

  • Does not scale well in more complex dependencies (A depends on B and C types, and B also depends on C types) as it induces a lot of destructive substitutions and module noise to convince the typechecker that C.t in A is the same as C.t in B

  • Does not work for exposed and private (exposed in read-only) dependency types