========
 Design
========

Stressant is shipped in the Debian GNU/Linux distribution. It is also
part of the `Grml <http://grml.org/>`__ project `since August 2017
<https://github.com/grml/grml-live/pull/34>`_, so it benefits from
its extensive `list of utilities <http://grml.org/files/#debian>`__,
which cover most of the rescue systems out there (e.g. Debian Live and
`Debirf <http://cmrg.fifthhorseman.net/wiki/debirf>`__, see below for
a more thorough comparison).

The Grml distribution is an ISO image that can be burned on CD/DVD or
copied to a USB drive, or net-bootable images. Grml can perform:

-  memory tests with ``memtest86``
-  hardware detection and inventory with
   `HDT <http://hdt-project.org/>`__

There are also many more options, for example loading to RAM or setting
the system to be read-only, see the
`cheatcodes <http://git.grml.org/?p=grml-live.git;a=blob_plain;f=templates/GRML/grml-cheatcodes.txt;hb=HEAD>`__
list for more details.

The stressant tool
~~~~~~~~~~~~~~~~~~

Stressant itself is a Python program that calls other UNIX
utilities, collects their output on the screen, in a logfile and/or
sends it over email.

The objective of this software is to automate a basic stress-testing
suite that, once started, will go through a basic
CPU/memory/disk/network test framework and report any errors and
failures.

This is done through the ``stressant`` script, which performs the
following tests:

-  ``lshw`` and ``smartctl`` for hardware inventory
-  ``dd``, ``hdparm``, ``fio`` and ``smartctl`` for disk testing -
   ``fio`` can also overwrite disk drives with the proper options
   (``--overwrite`` and ``--size=100%``)
-  ``stress-ng`` for CPU testing
-  ``iperf3`` for network testing

Here is an example test run::

    $ sudo ./stressant --email anarcat@anarc.at --writeSize 1M --cpuBurnTime 1s --iperfTime 1
    INFO: Starting tests
    INFO: CPU cores: 4
    INFO: Memory: 16 GiB (16715816960 bytes)
    INFO: Hardware inventory
    DEBUG: Calling lshw -short
    OUTPUT: H/W path             Device     Class          Description
    OUTPUT: ==========================================================
    OUTPUT: system         Desktop Computer
    OUTPUT: /0                              bus            NUC6i3SYB
    OUTPUT: /0/0                            memory         64KiB BIOS
    OUTPUT: /0/22                           memory         64KiB L1 cache
    OUTPUT: /0/23                           memory         64KiB L1 cache
    OUTPUT: /0/24                           memory         512KiB L2 cache
    OUTPUT: /0/25                           memory         3MiB L3 cache
    OUTPUT: /0/26                           processor      Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
    OUTPUT: /0/27                           memory         16GiB System Memory
    OUTPUT: /0/27/0                         memory         16GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
    OUTPUT: /0/27/1                         memory         [empty]
    OUTPUT: /0/100                          bridge         Skylake Host Bridge/DRAM Registers
    OUTPUT: /0/100/2                        display        HD Graphics 520
    OUTPUT: /0/100/14                       bus            Sunrise Point-LP USB 3.0 xHCI Controller
    OUTPUT: /0/100/14/0          usb1       bus            xHCI Host Controller
    OUTPUT: /0/100/14/0/1        scsi3      storage        USB to ATA/ATAPI Bridge
    OUTPUT: /0/100/14/0/1/0.0.0  /dev/sdb   disk           500GB 00ABYS-01TNA0
    OUTPUT: /0/100/14/0/1/0.0.1  /dev/sdc   disk           500GB 00ABYS-01TNA0
    OUTPUT: /0/100/14/0/3                   input          Dell USB Keyboard
    OUTPUT: /0/100/14/0/4                   input          Kensington Expert Mouse
    OUTPUT: /0/100/14/0/7                   communication  Bluetooth wireless interface
    OUTPUT: /0/100/14/1          usb2       bus            xHCI Host Controller
    OUTPUT: /0/100/14.2                     generic        Sunrise Point-LP Thermal subsystem
    OUTPUT: /0/100/16                       communication  Sunrise Point-LP CSME HECI #1
    OUTPUT: /0/100/17                       storage        Sunrise Point-LP SATA Controller [AHCI mode]
    OUTPUT: /0/100/1c                       bridge         Sunrise Point-LP PCI Express Root Port #5
    OUTPUT: /0/100/1c/0                     network        Wireless 8260
    OUTPUT: /0/100/1e                       generic        Sunrise Point-LP Serial IO UART Controller #0
    OUTPUT: /0/100/1e.6                     generic        Sunrise Point-LP Secure Digital IO Controller
    OUTPUT: /0/100/1f                       bridge         Sunrise Point-LP LPC Controller
    OUTPUT: /0/100/1f.2                     memory         Memory controller
    OUTPUT: /0/100/1f.3                     multimedia     Sunrise Point-LP HD Audio
    OUTPUT: /0/100/1f.4                     bus            Sunrise Point-LP SMBus
    OUTPUT: /0/100/1f.6          eno1       network        Ethernet Connection I219-V
    OUTPUT: /0/1                 scsi2      storage
    OUTPUT: /0/1/0.0.0           /dev/sda   disk           500GB WDC WDS500G1B0B-
    OUTPUT: /0/1/0.0.0/1         /dev/sda1  volume         511MiB Windows FAT volume
    OUTPUT: /0/1/0.0.0/2         /dev/sda2  volume         244MiB EFI partition
    OUTPUT: /0/1/0.0.0/3         /dev/sda3  volume         465GiB EFI partition
    INFO: SMART information for /dev/sda
    DEBUG: Calling smartctl -i /dev/sda
    OUTPUT: smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
    OUTPUT: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    OUTPUT: 
    OUTPUT: === START OF INFORMATION SECTION ===
    OUTPUT: Device Model:     WDC WDS500G1B0B-00AS40
    OUTPUT: Serial Number:    XXXXXXXXXXXX
    OUTPUT: LU WWN Device Id: XXXXXXXXXXXX
    OUTPUT: Firmware Version: XXXXXXXXXXXX
    OUTPUT: User Capacity:    500,107,862,016 bytes [500 GB]
    OUTPUT: Sector Size:      512 bytes logical/physical
    OUTPUT: Rotation Rate:    Solid State Device
    OUTPUT: Form Factor:      M.2
    OUTPUT: Device is:        Not in smartctl database [for details use: -P showall]
    OUTPUT: ATA Version is:   ACS-2 T13/2015-D revision 3
    OUTPUT: SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
    OUTPUT: Local Time is:    Fri Mar 17 10:24:52 2017 EDT
    OUTPUT: SMART support is: Available - device has SMART capability.
    OUTPUT: SMART support is: Enabled
    OUTPUT: 
    INFO: Basic disk bandwidth tests
    INFO: Writing 1MB file
    DEBUG: Calling dd bs=1M count=512 conv=fdatasync if=/dev/zero of=test
    OUTPUT: 512+0 records in
    OUTPUT: 512+0 records out
    OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 1.39591 s, 385 MB/s
    INFO: Reading 1MB file
    DEBUG: Calling dd bs=1M count=512 of=/dev/null if=test
    OUTPUT: 512+0 records in
    OUTPUT: 512+0 records out
    OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 0.0848588 s, 6.3 GB/s
    INFO: Hdparm test
    DEBUG: Calling hdparm -Tt /dev/sda
    OUTPUT: 
    OUTPUT: /dev/sda:
    OUTPUT: Timing cached reads:   12406 MB in  2.00 seconds = 6207.39 MB/sec
    OUTPUT: Timing buffered disk reads: 1504 MB in  3.00 seconds = 501.13 MB/sec
    INFO: Disk stress test
    DEBUG: Calling fio --name=stressant --readwrite=randrw --numjob=4 --sync=1 --direct=1 --group_reporting --size=1M --output=/tmp/tmpo2QJnR
    OUTPUT: stressant: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
    OUTPUT: ...
    OUTPUT: fio-2.16
    OUTPUT: Starting 4 processes
    OUTPUT: 
    OUTPUT: stressant: (groupid=0, jobs=4): err= 0: pid=978: Fri Mar 17 10:25:07 2017
    OUTPUT:   read : io=2160.0KB, bw=5669.3KB/s, iops=1417, runt=   381msec
    OUTPUT:     clat (usec): min=141, max=2197, avg=486.59, stdev=344.70
    OUTPUT:      lat (usec): min=141, max=2198, avg=486.96, stdev=344.71
    OUTPUT:     clat percentiles (usec):
    OUTPUT:      |  1.00th=[  145],  5.00th=[  153], 10.00th=[  161], 20.00th=[  175],
    OUTPUT:      | 30.00th=[  189], 40.00th=[  217], 50.00th=[  278], 60.00th=[  676],
    OUTPUT:      | 70.00th=[  748], 80.00th=[  828], 90.00th=[  916], 95.00th=[  980],
    OUTPUT:      | 99.00th=[ 1384], 99.50th=[ 1800], 99.90th=[ 2192], 99.95th=[ 2192],
    OUTPUT:      | 99.99th=[ 2192]
    OUTPUT:   write: io=1936.0KB, bw=5081.4KB/s, iops=1270, runt=   381msec
    OUTPUT:     clat (usec): min=618, max=6602, avg=2566.13, stdev=1029.39
    OUTPUT:      lat (usec): min=619, max=6602, avg=2566.71, stdev=1029.39
    OUTPUT:     clat percentiles (usec):
    OUTPUT:      |  1.00th=[  732],  5.00th=[  900], 10.00th=[  964], 20.00th=[ 1672],
    OUTPUT:      | 30.00th=[ 1976], 40.00th=[ 2384], 50.00th=[ 2640], 60.00th=[ 3152],
    OUTPUT:      | 70.00th=[ 3312], 80.00th=[ 3440], 90.00th=[ 3568], 95.00th=[ 3856],
    OUTPUT:      | 99.00th=[ 4704], 99.50th=[ 4960], 99.90th=[ 6624], 99.95th=[ 6624],
    OUTPUT:      | 99.99th=[ 6624]
    OUTPUT:     lat (usec) : 250=24.41%, 500=4.88%, 750=7.91%, 1000=19.34%
    OUTPUT:     lat (msec) : 2=10.84%, 4=30.66%, 10=1.95%
    OUTPUT:   cpu          : usr=0.80%, sys=2.39%, ctx=1945, majf=0, minf=35
    OUTPUT:   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    OUTPUT:      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    OUTPUT:      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    OUTPUT:      issued    : total=r=540/w=484/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    OUTPUT:      latency   : target=0, window=0, percentile=100.00%, depth=1
    OUTPUT: 
    OUTPUT: Run status group 0 (all jobs):
    OUTPUT:    READ: io=2160KB, aggrb=5669KB/s, minb=5669KB/s, maxb=5669KB/s, mint=381msec, maxt=381msec
    OUTPUT:   WRITE: io=1936KB, aggrb=5081KB/s, minb=5081KB/s, maxb=5081KB/s, mint=381msec, maxt=381msec
    OUTPUT: 
    OUTPUT: Disk stats (read/write):
    OUTPUT:     dm-3: ios=207/493, merge=0/0, ticks=120/296, in_queue=416, util=58.78%, aggrios=540/1527, aggrmerge=0/0, aggrticks=288/752, aggrin_queue=1040, aggrutil=75.30%
    OUTPUT:     dm-0: ios=540/1527, merge=0/0, ticks=288/752, in_queue=1040, util=75.30%, aggrios=540/1326, aggrmerge=0/201, aggrticks=264/704, aggrin_queue=968, aggrutil=74.49%
    OUTPUT:   sda: ios=540/1326, merge=0/201, ticks=264/704, in_queue=968, util=74.49%
    INFO: CPU stress test for 1s
    DEBUG: Calling stress-ng --timeout 1s --cpu 0 --ignite-cpu --metrics-brief --log-brief --tz --times --aggressive
    OUTPUT: dispatching hogs: 4 cpu
    OUTPUT: cache allocate: default cache size: 3072K
    OUTPUT: successful run completed in 1.05s
    OUTPUT: stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
    OUTPUT: (secs)    (secs)    (secs)   (real time) (usr+sys time)
    OUTPUT: cpu                 453      1.04      2.80      0.00       437.62       161.79
    OUTPUT: cpu:
    OUTPUT: acpitz   27.80 °C
    OUTPUT: pch_skylake   32.77 °C
    OUTPUT: acpitz   31.78 °C
    OUTPUT: x86_pkg_temp   34.40 °C
    OUTPUT: for a 1.05s run time:
    OUTPUT: 4.22s available CPU time
    OUTPUT: 2.81s user time   ( 66.64%)
    OUTPUT: 0.01s system time (  0.24%)
    OUTPUT: 2.82s total time  ( 66.87%)
    OUTPUT: load average: 0.34 0.58 2.52
    INFO: Running network benchmark
    DEBUG: Calling iperf3 -c iperf.he.net -t 1
    OUTPUT: iperf3: error - the server is busy running a test. try again later
    ERROR: Command failed: Command 'iperf3 -c iperf.he.net -t 1' returned non-zero exit status 1
    INFO: all done
    INFO: sent email to ['anarcat@anarc.at'] using anarc.at

Note that there are nice colors in an actual console, the above is just
a dump of the logfile.

We currently use the ``iperf.he.net`` server from `Hurricane
Electric <https://he.net/>`__ as a default server for our tests, but
users are encouraged to change that to a local server using the
``--iperfServer`` argument to get more accurate results. Notice how that
performance test failed, above, because the HE server wasn't available:
this is just another hint that you should use your own server.

A number of public iPerf servers are available, here are a few lists:

* <https://iperf.fr/iperf-servers.php>
* <https://github.com/R0GGER/public-iperf3-servers>
* <https://proof.ovh.ca/>

Background
~~~~~~~~~~

This project emanates from a `packaging
effort <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707178>`__ of a
custom Linux distribution called 'breakin'. It turned into a simple
Python program that reuses existing stress-testing programs packaged in
Debian.

Stressant *used* to be built as a standalone Debian Derivative, a `Pure
blend <https://wiki.debian.org/DebianPureBlends>`__ based on
`Debirf <http://cmrg.fifthhorseman.net/wiki/debirf>`__. But in 2017, the
project was rearchitectured to be based on Grml and to focus on
developing a standalone stress-testing tool. While Stressant could still
become its own `Debian
derivative <https://wiki.debian.org/Derivatives>`__, it seems futile
for now to make yet another Debian derivative. Instead, we focus our
energy into contributing to the Grml project
without needing to create the heavy infrastructure for another Linux
distribution.

The `Is your Computer
Stable? <https://blog.codinghorror.com/is-your-computer-stable/>`__ post
from Jeff Atwood was a motivation to get back into the project. It
outlines a few basic tools to use to make sure your computer is stable:

-  memtest86 - shipped with Grml
-  install Ubuntu - we assume you'll do that anyways
-  `MPrime <http://www.mersenne.org/download/>`__ to stress the CPU -
   not free software, [stress-ng][] was chosen instead and gives similar
   results here
-  `badblocks <https://en.wikipedia.org/wiki/Badblocks>`__ test (with
   ``-sv``) - this is covered by the ``fio`` test
-  smartctl -i/-a/-t to identify and test harddrives
-  ``dd`` and ``hdparm`` to get quick stats - done
-  `bonnie++ <https://en.wikipedia.org/wiki/Bonnie%2B%2B>`__ for more
   extensive benchmarks - Grml people suggested we use
   `fio <http://fio.readthedocs.org/>`__ instead
-  `iperf <https://en.wikipedia.org/wiki/Iperf>`__ for network testing -
   this assumes a local server, we instead use
   `iperf3 <http://software.es.net/iperf/>`__ and `public
   servers <https://iperf.fr/iperf-servers.php>`__
-  `furmark <http://www.ozone3d.net/benchmarks/fur/>`__ for testing the
   GPU - Windows-only, no Linux equivalent, the `Phoronix test
   suite <https://en.wikipedia.org/wiki/Phoronix_Test_Suite>`__ uses
   `ffmpeg tests <https://openbenchmarking.org/test/pts/ffmpeg>`__ for
   that purpose

The idea was to regroup this in a single tool that would perform all
those tests, without reinventing the wheel of course.

Stressant was also highly coupled with
`Koumbit <https://koumbit.org>`__'s infrastructure as this is where the
Debirf recipes were originally developed. It needed a CI system to build
the images, which was originally done with Jenkins. This was then done
with Gitlab CI, but failed to build images because of issues with debirf
and, ultimately, docker itself. This is why why Grml was used as a basis
for future development.

Remaining work
~~~~~~~~~~~~~~

Stressant could run in a tmux or screen session that would show the
current task on one pane and syslog (or journalctl) in another. This
would allow for more information to be crammed in a single display while
at the same time making remote access (e.g. through SSH) easier to
switch to.

.. note:: Parallism is discussed as part of a larger redesign in
          `issue #3 <https://gitlab.com/anarcat/stressant/issues/3>`_.

Finally, we need clear and better documentation on various testing tools
there are out there, a bit like TAILS is doing. For example, we used to
ship with ``diskscan`` but i didn't even remember that and I am not sure
what to use it for or when to use it. A summary description of the
available tools, maybe through a menu system or at least a set of HTML
files, would be useful. I use Sphinx and RST for this
because of the simplicity and availability of tools like readthedocs.org
and the ease of creation for offline documentation (PDF and ePUB). A
rendered copy of the documentation is available on
`stressant.readthedocs.io <https://stressant.readthedocs.io/>`_ and in
the ``stressant-doc`` package. The metapackage (``stressant-meta``)
lists the relevant recovery tools and some of those are documented in
the :doc:`usage`.

Similar software
~~~~~~~~~~~~~~~~

In the meantime, here's a list of software that's similar to stressant
or that could be used by stressant.

Test suites
^^^^^^^^^^^

Those are fairly similar to stressant in that they perform multiple benchmarks:

- `Breakin`_ - stress-test and hardware diagnostics tool
- `Checkbox`_ - Ubuntu's certification tool, shipped with Debian
  stretch but `removed because upstream switched to snaps <926953>`_,
  `new RFP <987674>`_
- `Inquisitor`_ - hardware testing suite
- `OpenBenchmarking.org`_ - a good source of benchmarking tools
- `PerfKit Benchmarker`_ - GCP's benchmarking tool
- `Phoronix test suite`_ - far-ranging benchmarking suite
- `Stressapptest`_ - Stressful Application Test, userspace memory and
  IO test - similar to stressant
- `bench-scripts`_ - a review of many benchmarking scripts that
  provide a nice and simple interface for basic benchmarks
- `sys\_basher`_ - another stress-testing tool
- `Ars Technica`_ - has a interesting post detailing a few key fio
  commands that should be ran
- `hardware`_ - python module for hardware inventory, reuses ``lshw``,
  ``pciutils``, ``smartmontools``, etc, `not in Debian yet <1032137>`_
- `fio-plot`_ - Python wrapper around fio that draws graphs
- `disk-burnin-and-testing`_ - simple shell script that (1) runs a
  SMART short test, (2) runs badblocks and (3) runs a SMART extended
  test

.. _Breakin: http://www.advancedclustering.com/products/software/breakin/
.. _Checkbox: https://checkbox.readthedocs.io/en/latest/
.. _Inquisitor: https://en.wikipedia.org/wiki/Inquisitor_(hardware_testing_software)
.. _OpenBenchmarking.org: https://openbenchmarking.org/
.. _PerfKit Benchmarker: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker
.. _Phoronix test suite: https://en.wikipedia.org/wiki/Phoronix_Test_Suite
.. _Stressapptest: https://github.com/stressapptest/stressapptest
.. _bench-scripts: https://github.com/haydenjames/bench-scripts
.. _sys\_basher: http://www.polybus.com/sys_basher_web/
.. _926953: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926953
.. _987674: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987674
.. _Ars Technica: https://arstechnica.com/gadgets/2020/02/how-fast-are-your-disks-find-out-the-open-source-way-with-fio/
.. _hardware: https://github.com/redhat-cip/hardware
.. _1032137: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032137
.. _disk-burnin-and-testing: https://github.com/Spearfoot/disk-burnin-and-testing

Purpose-specific tools
^^^^^^^^^^^^^^^^^^^^^^

* `chipsec`_ - framework for analyzing the security of PC platforms
  including hardware, system firmware (BIOS/UEFI), and platform
  components

* `FWTS`_ - Ubuntu's Firmware Test Suite - performs sanity checks on
  Intel/AMD PC firmware. It is intended to identify BIOS and ACPI
  errors and if appropriate it will try to explain the errors and give
  advice to help workaround or fix firmware bugs

* `Power Stress and Shaping Tool (PSST)`_ - a "controlled
  power tool for Intel SoC components such as CPU and GPU. PSST
  enables very fine control of stress function without its own
  process overhead", packaged in Debian as `psst`_

* `The stress terminal (s-tui)`_ - mostly for testing CPU, temperature
  and power usage, now included in the meta-package

* `tinymembench`_ - memory bandwidth userland tester

* `stressdisk`_ - "Stress test your disks / memory cards / USB sticks
  before trusting your valuable data to them"
  
.. _FWTS: https://wiki.ubuntu.com/FirmwareTestSuite/
.. _Power Stress and Shaping Tool (PSST): https://01.org/power-stress-and-shaping-tool
.. _The stress terminal (s-tui): https://amanusk.github.io/s-tui/
.. _chipsec: https://github.com/chipsec/chipsec
.. _psst: https://tracker.debian.org/pkg/psst
.. _tinymembench: https://github.com/ssvb/tinymembench
.. _stressdisk: https://github.com/ncw/stressdisk

Building images by hand
~~~~~~~~~~~~~~~~~~~~~~~
.. note:: Starting from August 2017, stressant is part of Grml, and
          it's usually superfluous to build your own image unless
          you're into that kind of kinky stuff. Those notes are kept
          mostly for historical purposes.

There is a handy ``build-iso.sh`` script that will setup APT
repositories and run all the right commands to build a Grml Stressant
ISO image on a recent Debian release (tested on Jessie and stretch). Note
that you can pass extra flags to the
`grml-live <http://grml.org/grml-live/>`__ command with the
``$GRML_LIVE_FLAGS`` environment variable. What follows is basically a
description of what that script does.

To build an image by hand, you will need to first install the
``grml-live`` package which is responsible for building Grml images. For
this, you will need to add the `Grml Debian
repository <http://deb.grml.org/>`__ to your ``sources.list`` file.
Instructions for doing so are available in the `files section of the
Grml site <http://grml.org/files/>`__.

Once this is done, you should be able to build an image using::

    sudo grml-live -c DEBORPHAN,GRMLBASE,GRML_FULL,RELEASE,AMD64,IGNORE,STRESSANT \
       -s unstable -a amd64 \
      -o $PWD/grml -U $USER \
      -v $(date +%Y.%m) -r gossage -g grml64-full-stressant

This will build a "full" Grml release (``-c``) based on Debian unstable
on a 64 bit architecture (``-a``) in the ``./grml`` subdirectory
(``-o``). The files will be owned (``-U``) by the current user
(``$USER``). The version number (``-v``), the release name (``-r``) and
flavor (``-g``) are just cargo-culted from the `upstream official
release <http://jenkins.grml.org/job/grml64-full_Release/26/console>`__.
See the `grml-live <http://grml.org/grml-live/>`__ for further options,
but do note the ``-u`` option that can be used to rerun the builds if
you want to only update the image to the latest release, for example.

The resulting ISO will be in
``./grml/grml_isos/grml64-full_$(date +%Y.%m%d).iso``. To make a
multi-arch ISO, you should use the ``grml2iso`` command. For example,
this is how `upstream
builds <http://jenkins.grml.org/job/grml96-full_Release/lastSuccessfulBuild/console>`__
the ``96`` ISO which features the 32 bits and 64 bits architectures::

    grml2iso -o grml96-small_2014.11.iso grml64-small_2014.11.iso grml32-small_2014.11.iso

Build system review
~~~~~~~~~~~~~~~~~~~

The following is a summary evaluation of the different options
considered by the Stressant project to build live images. This problem
space is currently in flux: at the time of writing, the tools used to
build the Debian Live images are changing and the future of the
project was `uncertain`_. Keep this in mind when you read this in the
future. Here are the options that were considered, with a detailed
evaluation below:

.. _uncertain: https://lists.debian.org/msgid-search/20170626140821.noixwidcadj4rphr@einval.com

.. contents::
   :local:

The `Debian cloud team <https://wiki.debian.org/Teams/Cloud>`__ also
considered a few tools to generate their cloud images, and some are
relevant here (`FAI <https://wiki.debian.org/FAI>`__ and
`vmdebootstrap <https://vmdebootstrap.alioth.debian.org/>`__), see `this
post <https://lists.debian.org/debian-cloud/2016/11/msg00100.html>`__
for details. There's also this more `exhaustive list of
tools <https://wiki.debian.org/SystemBuildTools>`__ to build Debian
systems.

DebIRF
^^^^^^

Debirf stands for Debian InitRamFs and builds the live image into the
``initrd`` file. It was originally used by Stressant because it was
simple and easy to modify. It also allowed booting from the network
easily, as we only had to load the kernel and didn't have to bother
with ISO images loading or NFS, like other options.

In the end, however, Debirf proved to be too limited for our needs: it
doesn't provide a way to embed boot-level, arbitrary binaries like
memtest86 because it is too tightly coupled with the Linux kernel.
Furthermore, we were having serious issues building debirf images in
newer releases, either in Debian 8 (`bug #806377`_) or 9 (`bug
#848834`_).

.. _bug #806377: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806377
.. _bug #848834: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=848834

FAI
^^^

I have tried to use FAI to follow the lead of the `Debian cloud team
<https://wiki.debian.org/Teams/Cloud>`__. Unfortunately, I stumbled
upon a few bugs. First, `fai-diskimage
<http://fai-project.org/doc/man/fai-diskimage.html>`__ would fail to
build an image if the host uses LVM. This was `fixed in FAI 5.3.3
<https://lists.debian.org/debian-cloud/2017/01/msg00003.html>`__ (or
maybe 5.3.4?). Also, FAI seems to fetch base files from a `cleartext
URL <http://fai-project.org/download/basefiles/>`__, which seems like
a dubious security choice.

After finding `this
tutorial <https://noah.meyerhans.us/blog/2017/02/10/using-fai-to-customize-and-build-your-own-cloud-images/>`__,
I figured I would give it a try again. Unfortunately, after asking on
IRC (``#debian-cloud`` on OFTC), I was told (by Noah!) that
"*fai-diskimage is probably not what you want for an iso image*\ " and
they suggested I use ``fai-cd``. Unfortunately, ``fai-cd`` works
completely differently: it doesn't support the ``--class`` system that
``fai-diskimage`` was built with, so we can't reuse those already
mysterious recipes. ``fai-cd`` seems to be built towards creating
install medium and not live images.

All this seems to make FAI mostly unusable for the task at hand,
although it *should* be noted that
`grml-live <http://grml.org/grml-live/>`__ uses FAI to build their
images...

vmdebootstrap
^^^^^^^^^^^^^

`vmdebootstrap <https://vmdebootstrap.alioth.debian.org/>`_ is a
minimal image building tool, written in Python, used for `Debian live
<https://www.debian.org/devel/debian-live/>`__ images. It requires
root (for loop filesystems creation), and doesn't support shipping the
Debian installer anymore.

We have had good results with ``vmdebootstrap``, but the fact that it
requires a loop device has made it difficult to use Gitlab's CI
system. Docker has a `bug
<https://github.com/docker/docker/issues/27886>`__ that makes it
impossible to use loop devices and ``kpartx`` commands in it. So to
build images through Gitlab's CI would require full virtualization
instead of just Docker, something that's not provided by Gitlab.com
right now. This problem is probably shared by all image building
tools, however.

Worse: VirtualBox did not `make it to stretch at all
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=794466>`__ which
makes it difficult to deploy new builders for it.

live-build and live-wrapper
^^^^^^^^^^^^^^^^^^^^^^^^^^^

`live-build <https://tracker.debian.org/pkg/live-build>`__ is a set of
tools used by `Debian live
<https://www.debian.org/devel/debian-live/>`__ and other blends
(e.g. `PGP cleanroom
<https://wiki.debian.org/OpenPGP/CleanRoomLiveEnvironment>`__ `uses
live-build
<https://anonscm.debian.org/cgit/collab-maint/make-pgp-clean-room.git/tree/scripts>`__). It
used to be a set of shell scripts, but it now uses `live-wrapper
<https://live-wrapper.readthedocs.io/en/latest/>`__ which uses
`vmdebootstrap <https://vmdebootstrap.alioth.debian.org/>`__ in the end.

It is unclear if I am better off using live build or vmdebootstrap
directly. The PGP cleanroom build uses live build, so maybe I should do
that as well...

Notes: live-wrapper is where the idea of `using
HDT <https://anonscm.debian.org/cgit/debian-live/live-wrapper.git/tree/lwr/run.py>`__
comes from. Unfortunately, it looks like the boot menus don't actually
work yet (`bug #813527 <https://bugs.debian.org/813527>`__).
Furthermore, live-wrapper doesn't support `initrd-style
netboot <https://bugs.debian.org/849015>`__, so we would need
documentation on how to boot from ISO files over PXE.

Grml
^^^^

`Grml <http://grml.org/>`__ is a project quite similar to the original
goal of stressant:

-  based on Debian
-  provides rescue tools
-  `live CD/USB image <http://grml.org/grml-live/>`__
-  also provides support for netboot through
   `grml-terminalserver <http://grml.org/terminalserver/>`__, which can
   `use remote
   squashfs <http://www.pro-linux.de/kurztipps/2/1432/grml-small-200811-via-pxe-booten-ohne-nfs.html>`__

The project is really interesting and we have therefore switched the
focus of Stressant towards creating an integrated stress-test tool on
*top* of grml instead of trying to fix all the issues those guys are
already struggling with... We use `grml-debootstrap
<http://grml.org/grml-debootstrap/>`__, a tool similar to
vmdebootstrap, to build stressant images.

Grml has most of the packages we had in our dependencies, except
those::

    blktool
    bonnie++
    chntpw
    diskscan
    e2tools
    fatresize
    foremost
    hfsplus
    i7z
    lm-sensors
    mtd-utils
    scrub
    smp-utils
    stress-ng
    tofrodos
    u-boot-tools
    wodim

Of those, only ``stress-ng`` is *actually* required by ``stressant``.

The remaining issues with Grml integration are:

1. add stressant to the Grml build (`pull request
   #34 <https://github.com/grml/grml-live/pull/34>`__ - done!)
2. review the above packages we collected from various rescue modes and
   see if they are relevant
3. hook stressant in the magic Grml menu to start directly from the boot
   menu - we can use the ``scripts=path-name`` argument for this, it
   looks in the "``DCS`` dir", which is ``/`` or whatever ``myconfig=``
   points at
4. figure out how to `chain into
   memtest86 <https://github.com/grml/live-boot-grml/issues/5>`__ to
   complete the test suite

Because ``stressant`` has been accepted into Debian, we should not need
to setup our own build system, unless Grml refuses to integrate the
package directly. In any case, we *may* want to setup our own Continuous
Integration (CI) system to build feature branches and similar. The
proper way to do this seems to be to add the ``.deb`` file directly at
the root (``/``) of the live filesystem with the `install local
files <http://grml.org/grml-live/#install-local-files>`__ technique and
the ``debs`` boot-time argument.

mkosi
^^^^^

`mkosi <https://github.com/systemd/mkosi>`_ is another tool from the
systemd folks which we may want to consider.
