.. Copyright 2011 Obsidian Research Corp. GPLv2, see COPYING.

.. _ibtool:

The ``ibtool`` Program
**********************

``ibtool`` collects a number of pre-existing diagnostic utilities, rewritten
to use ``python-rdma`` under one program. The rewrites serve as both test cases
for the library and programming examples for proper library usage. In general
the programs strive to be close to the originals but there are many subtle
differences.

.. note::
  'libib' in the context of this document refers to the C/Bash/Perl/etc
  versions of the ibtools commands based on libib that are being cloned in
  this implementation.

Addressing
==========

``ibtool`` uses the functions in `python-rdma` for processing address
arguments, and removes much of the artificial specificity required by the
original tools.

End Port to Use
---------------

Most tools require an end port to operate on. The libib mechanism involves
specifying the device name and port number with two separate arguments. This
is still supported but the port argument now accepts a full end port
specifier:

      =========== ===================
      Format      Example
      =========== ===================
      device      mlx4_0  (defaults to the first port)
      device/port mlx4_0/1
      Port GID    fe80::2:c903:0:1491
      Port GUID   0002:c903:0000:1491
      =========== ===================

So the device argument is not required, and its use discouraged. The device
argument will also accept a node GUID to reference the device.

As with the legacy tools the default end port is the first end port in the
system.

Target End Port Address
-----------------------

Most of the commands will operate on a remote port, this requires doing a MAD
RPC from the local end port to the remote. Commands accept a uniform format
for specifying the target end port

      ============= =====================================
      Format        Example
      ============= =====================================
      Port GID      fe80::2:c903:0:1491
      Scoped GID    fe80::2:c903:0:1491%mlx4_0/1
      Port GUID     0002:c903:0000:1491
      LID           12 (decimal)
      DR Path       0,1  (the thing connected to port 1)
      Path Spec	    IBPath(SLID=8,DLID=8,SL=2,pkey=0xFFF)
      Hex Port GUID 0x0002c90300001491
      	       	    (requires -G)
      ============= =====================================

The directed route (DR) path option allows specifying a directed route path,
the value ``0,`` is the local end port, ``0,1`` is the thing connected to port
1 of the local end port, etc.

The formats for each type are unambiguous, so the program simply determines
the correct entry automatically, legacy options specifying the type are
supported and the command fails if the provided argument does not match.

When a directed route or LID is specified it is used as-is for sending SMPs.
If a GMP is required then the address is resolved to a full path using the SA.
Resolving a directed route to a GMP path is done using
a series of :class:`rdma.IBA.SALinkRecord` RPCs.

When using a full path specification refer to the documentation for
:class:`rdma.path.IBPath`, the format is the classes :func:`repr` format. If a
complete path, with source and destination, is specified then it is used as
is, otherwise the SA will be used to resolve it. This is true even for GMP
paths. This extended format can be used to specify all parameters, including
pkey, presence and content of a GRH, packet_life_time, etc.

Error Handling
==============

The tool relies on `python-rdma`'s exception system for end user error
reporting. This system provides a great level of detail for most user visible
errors. The -v option is used to increase the error diagnostic output,
up to including packet dumps for failing MADs.

Discovery
=========

Tools which require topology discovery use the :class:`rdma.subnet.Subnet`
information database, which has several different methods for collecting the
data, including DR, LID SMP and from the SA. Everything uses the parallel
MAD scheduler for loading the databases.

Currently the discovery has no error recovery, so it will blow up ungracefully
if MADs can't traverse links they are supposed to, or LIDs don't work.

Notable Differences
===================

Compared to the libib versions:

* The internal library relies less on DR queries to the local port to get
  information - instead this comes from sysfs. This makes MKey enforcement more
  usable.
* Greater consistency. All GIDs are printed and accepted in IPv6 format, GUIDs
  are printed and accepted in three colon GUID format (eg
  0002:c903:0000:1492), and unicast LIDs are printed in decimal format. LIDs
  are accepted as explicit hex (0x1a) or decimal arguments. Hex is uniformly
  lower case and when zero justified the number of zeros used is correct for
  the width of the type.
* The output of some commands is subtly different, ie commands that used to
  print inconsistent output (hex LIDs, hex GUIDs, etc) print in canonical
  format
* Path record queries are always done for LID target end ports when using GMP,
  the correct SL to use is never assumed.
* A broader range of input is accepted for most arguments (ie GIDs, GUIDs, DR
  paths, etc) and the input argument type is unambiguously determined by
  format.
* Debug output is quite different and dramatically better.
* Error output is different and dramatically better. More -v's provide more
  detail down to decoded packet dumps of the erroring MAD::

    E: RPC MAD_METHOD_GET(1) SMPFormatDirected(129.1) SMPPortInfo(21) timed out to 'DR Path (0, 1, 4)'

  vs::

    ibwarn: [2018] mad_rpc: _do_madrpc failed; dport (DR path slid 65535; dlid 65535; 0,1,4)
    ibaddr: iberror: failed: can't resolve requested address

* All commands support the `--sa` option which causes SMPs to be converted
  into SA record queries and sent to the SA. (see
  :class:`rdma.satransactor.SATransactor`) In `--sa` mode no SMPs are
  issued. Some commands have `SubnAdmGetTable` support when in `--sa` mode
  which makes them run faster. (Be warned, opensm has various bugs in its
  \*Record support)
* None of the commands unconditionally write files into /var/cache/ or otherwise
  do file IO by default.

Discovery:

* All the discovery shell scripts are native Python and integrate properly with
  the command line system and support all the standard common options.
* The builtin discovery engine supports `--sa` which will rely entirely on SA
  Record queries for the data.
* All discovery using commands support the `--discovery` argument which can be
  LID, SMP or DR. DR exclusively uses directed route SMPs and can discover and
  unconfigured subnet. LID primarily uses LID routed SMP packets, except for a
  few DR SMPs to determine the connectivity. SA exclusively uses record data
  from the SA.
* By default discovery is done using LID mode, unless the connected end port
  is not active, then DR is used.
* Discovery data is stored in memory and re-used during the tool run,
  redundant queries are not issued.
* Everything is built on the parallel MAD scheduler
* The node name map file isn't implemented
* No chassis grouping functions are implemented
* Since no commands rely on frail text parsing, all node descriptions are
  supported in all tools, including putting " and other characters in them.
* All discovery commands support caching the result through the `--cache`
  option. The cache file is stored as a Python pickle and can be loaded
  by things other than `ibtool`. Use something like::

     --cache ~/.ibtools.cache-$A

  (FIXME support a config file or environment var or something for this)

Specific commands:

* `sminfo` gets the LID using a `SMPPortInfo` RPC when using directed route.
* `sminfo` has a `--sminfo_smkey` argument that is used for `SubnSet()` and
  `SubnGet()` RPCs. `SubnSet()` can send a 0 attribute modifier.
* `ibroute` uses the parallel MAD scheduler, displays LIDs in decimal and
  displays escaped node descriptions that are treated as UTF-8
* `ibroute` -M does not skip the last multicast LID.
* `ibroute` forgot how to limit by LID ranges (FIXME)
* `dump_lfts.sh` and `dump_mfts.sh` are internal commands that don't do
  duplicative work and are much faster.
* `ibhosts`, `ibswitches`, `ibrouters` and `ibnodes` display their output
  sorted by nodeGUID.
* `smpquery` sl2vl on a CA shows the CA port number not 0.
* `perfquery` supports directed route as an argument. The DR path is resolved
  to a LID path via a `SMPNodeInfo` RPC and a PR lookup to the SA
* `perfquery` uses the SA to get the `NodeInfo` (if needed) rather than using a
  SMP. It also uses the parallel MAD scheduler when looping over ports.
* `perfquery -l` works like `perfquery -a -l` instead of trying to request
  port 0 and often failing.
* `perfquery` gives a failure message if it is asked to loop over ports on
  a CA (which can't be done by simple port select) (FIXME: We could ask the SM
  how to reach the other ports)
* `perfquery` uses the `SMPNodeInfo.localPortNum` for the target as the default
  port number if none is given - this 'does the right thing' for CA ports
  and returns a result instead of an error for switch ports.
* `perfquery` will also handle `PMPortFlowCtlCounters`, `PMPortFlowCtlCounters`,
  `PMPortVLXmitFlowCtlUpdateErrors`, `PMPortVLXmitWaitCounters`,
  and `PMSwPortVLCongestion`
* `smpdump` has a `--decode` option to pretty print the MAD
* `smpdump` returns an error on timeout
* `smpdump` is joined by `decode_mad` which takes MADs in various formats
  and pretty prints them
* `saquery` supports all record types and supports all component masks via
  an enhanced syntax::

    saquery NR nodeInfo.portGUID=0017:77ff:feb6:2ca4

  This is done using Python dynamic introspection and codegen of the component
  mask layout.
* The inconsistent names from `saquery` are less inconsistent but don't match
  100% what `saquery` produces.  The `--int-names` option uses the names
  described in this document.
* `saquery` forgot how to do --node-name-map (FIXME)
* `saquery` options that have an associated Selector don't set the selector.
  (FIXME)
* `saquery` -g and -m do not work, -g sets smkey to 0 (FIXME)
* The command `query` is added which can issue any RPC, with any packet
  content entirely using the symbolic names in this document. This is done
  with Python introspection. Eg::

   $ ibtool query SubnAdmGet MADClassPortInfo -d
   debug: GMP Path 8 -> 8 SL=0 PKey=65535 DQPN=1
   debug: RPC MAD_METHOD_GET(1) SAFormat(3.2) MADClassPortInfo(1) completed to 'Path 8 -> 8 SL=0 PKey=65535 DQPN=1' len 256.
   BaseVersion......................1
   ClassVersion.....................2
   CapabilityMask...................0x2602
   CapabilityMask2..................0x0000000
   RespTimeValue....................16
   RedirectGID......................::
   RedirectTC.......................0x00
   RedirectSL.......................0
   RedirectFL.......................0
   RedirectLID......................0
   RedirectPKey.....................0x0000
   RedirectQP.......................0x000001
   RedirectQKey.....................0x80010000
   TrapGID..........................::
   TrapTC...........................0
   TrapSL...........................0
   TrapFL...........................0
   TrapLID..........................0
   TrapPKey.........................0x0000
   TrapHL...........................0
   TrapQP...........................0x000000
   TrapQKey.........................0x80010000

* `ibnetdiscover` prints the listing in a BFS order, not randomly.
* `ibfindnodesusing` only fetches subnet information actually used during
  output and supports more ways to specify the source switch.
* `ibfindnodesusing` learned the --all (show switches too) and -v (show
  LID and port GUID) options.
* `ibprintca/rt/switch` supports --sa which does limited SA queries to return
  the information instead of having to load a full topology.
* `ibprintca/rt/switch` displays the complete node stanza, instead of just a
  truncated version.
* `ibportstate` can work with CA ports if --sa is used (FIXME: Just do the
  --sa action for all CA ports..)
* `set_nodedesc` got the -C and -P options to set a single device. Also works
  with UTF-8 properly.
* `ibtracert` supports 1 or 2 arguments, with the single argument form
  meaning start at the current node, ala IP trace route
* `ibtracert` can resolve all address forms for the two arguments, and
  will use the SA to fill missing details.
* `ibtracert` supports all discovery options including caching and LID/SA
  discovery. When used with LID routing the tool is no longer bound by the
  64 hop DR limit.
* `ibcheck*` forgot how to colourize
* `ibcheckport` checks the localPortNum if it isn't a switch, and
  checks that a	port is not at a degraded speed and degraded width based
  on link*Supported.
* `ibcheck*` commands that iterate over the subnet are discovery commands
  and use the MAD parallelizer to do their checks. For this reason
  verbose output may be out of order, so we also show the end port LID
  and CA port number.
* `ibcheck*` discovery commands treat a 'node check' as an `end port check`
  and checks all end ports on a CA. It also checks switch port 0.
* `ibcheck*` discovery commands can use the subnet discovery database
  to check peer ports for link speed and link width. No warnings
  are generated if the max capability is being used. (eg SDR connected
  to DDR).
* `vendstat` only supports -N (FIXME)
* `ibsysstat` has different output. This is a fairly pointless program,
  it is included to illustrate/test a vendor OUI MAD server.
* `ibping` uses `ibsysstat` as a server. I could not bring myself to implement
  another ping class particularly when it used an attribute ID of 0..
* `ibswportwatch` has all the same options as `perfquery` and can watch
  all kinds of counters. The output format is different, but
  much more complete.
* `ibswportwatch` by default does the `-b` option, since this is
  less surprising. To get the threshold checking behavior use
  `--threshold`. A limits file identical to `ibcheckerrors`
  is supported.
* `ibidsverify` works like the `ibcheck\*` functions, not something
  unique. Doesn't bother to check nodeGUIDs becuase discovery cannot create
  duplicates. Learned to check LIDs considering LMC as well.
* `iblinkinfo` formats the output with slightly more alignment. Forgot
  how to do `--hops`
* `ibdiscover.pl` is aliased to `subnet_diff` because they do the same
  thing even if they work completely differently.
* `subnet_diff` will compare the set of end ports, nodes, and links between
  two subnet cache files. It also checks the link rates and LID to end port
  mapping for differences.

New Commands
============

* `query` can issue nearly arbitrary SMPs and GMPs
* `subnet_diff` can compute the differences between two subnets
* `set_port_state` will disable or enable a group of ports intelligently
  selecting communication paths that don't cross the affected links using
  directed route. This can be used to partition an IB network.
* `init_all_ports` will set all ports in the network to the INIT state.
  This can be used to try and recover a network that may be locked up due
  to credit loop or otherwise.

Commands
========

Supported:

=================== =================== =================== ===================
dump_lfts.sh        dump_mfts.sh	ibaddr		    ibcheckerrors
bcheckerrs	    ibchecknet	  	ibchecknode	    ibcheckport
ibcheckportstate    ibcheckportwidth 	ibcheckstate	    ibcheckwidth
ibclearcounters     ibclearerrors	ibdatacounters      ibdatacounts
ibdiscover.pl	    ibfindnodesusing.pl ibhosts		    ibidsverify.pl
iblinkinfo[.pl]     ibnetdiscover	ibnodes		    ibping
ibportstate	    ibprintca.pl	ibprintrt.pl	    ibprintswitch.pl
ibroute		    ibrouters		ibstat		    ibstatus
ibswitches	    ibswportwatch.pl	ibsysstat	    ibtracert
ibv_devices	    perfquery		rdma_bw		    saquery
set_nodedesc.sh     sminfo		smpdump		    smpquery
vendstat
=================== =================== =================== ===================

To be completed:

==================== ==================
check_lft_balance.pl ibqueryerrors[.pl]
==================== ==================

* `ibqueryerrors` is nearly identical to `ibcheckerrors`, `ibcheckerr`,
  `ibclearcounters`, and `ibclearcounters`. The `ibtool` version of the
  `ibcheck*` programs already includes all the optimizations, plus more, that
  are in `ibqueryerrors`. Even though the output formatting is much better
  I have not re-implemented it. (FIXME)
* I'm not sure what `check_lft_balance.pl` does.

Verbs examples/tests:

* Review test\_??\_loop in tests/verbs for an example of: `ibv_rc_pingpong`,
  `ibv_uc_pingpong`, `ibv_ud_pingpong`, `ibv_srq_pingpong`
* `rdma_bw` is similar to the same program in `perftest`