Development

Basic instructions

The default cclib files distributed with a release, as described in How to install, do not include any unit tests and logfiles necessary to run those tests. This section covers how to download the full source along with all test data and scripts, and how to use these for development and testing.

Cloning cclib from GitHub

cclib is hosted by the fantastic people at GitHub in a git repository. You can download a zipped archive of the current development version (called master) for installation and testing or browse the available releases. In order to contribute any changes, however, you will need to create a local copy of the repository:

git clone https://github.com/cclib/cclib.git

If you would like to contribute fixes, it is more likely that you want to fork the repository (see guidelines below) and clone that.

Installation and running tests

Once you have a copy of cclib,

  1. Create a virtual environment using a either venv or conda then activate it

  2. Install cclib in development mode with all dependencies: cd /location/of/cclib; python -m pip install -e '.[dev]'

  3. To run unit tests, python -m pytest

  4. To run regression tests, python -m pytest test/regression.py (requires cclib-data download)

You may not be interested in running all tests because they take too much time (some calculation methods), require a dependency not immediately available (primarily Psi4), or simply aren’t relevant for the code you’ll touch. Call pytest with the -k flag to select a subset of tests. Using the --co flag will show the names of matching tests without actually running them.

Guidelines

We follow a typical GitHub collaborative model, relying on forks and pull requests. In short, the development process consists of:

Here are some general guidelines for developers who are contributing code:

  • All contributions should be reviewed by at least one core developer

  • Contributions from a core developer need to be reviewed by another core developer

  • Run and review the unit tests (see below) before submitting a pull request.

  • There should normally not be more failed tests than before your changes.

  • For larger changes or features that take some time to implement, using branches is recommended.

Releasing a new version

The release cycle of cclib is irregular, with new versions being created as deemed necessary after significant changes or new features. We roughly follow semantic versioning with respect to the parsed attributes.

When creating a new release on GitHub, start by creating a new issue based on the following template that provides a high-level overview of the steps to take.

This is to track the work to be done for release v1.8.1:

- [ ] Migrate non-urgent issues/PRs to the 2.0 label ([2.0 list](https://github.com/cclib/cclib/issues?q=is%3Aopen+is%3Aissue+milestone%3Av2.0))
- [ ] Close out remaining PRs and issues ([PRs](https://github.com/cclib/cclib/pulls?q=is%3Aopen+is%3Apr+milestone%3Av1.8.1), [issues](https://github.com/cclib/cclib/issues?q=is%3Aopen+is%3Aissue+milestone%3Av1.8.1))
- [ ] Create `release-1.8.1` branch ([branch](https://github.com/cclib/cclib/tree/release-1.8.1))
- [ ] Bump version, update changelog, docs and other files (TODO)
- [ ] Create a v1.8.1 tag for the release
- [ ] Update release tag to HEAD on release branch
- [ ] Create a draft release from the v1.8.1 tag
- [ ] Attach source tar and zip archives to release on GitHub
- [ ] When ready, publish the draft release (https://github.com/cclib/cclib/releases/tag/v1.8.1)
- [ ] Upload new version to PyPI (https://pypi.org/project/cclib/1.8.1/)
- [ ] Make sure conda picks up the change (https://github.com/conda-forge/cclib-feedstock/pull/TODO)
- [ ] Update Zenodo if a major or minor version (https://zenodo.org/record/TODO)
- [ ] Make sure Zenodo parts of documentation get updated (https://github.com/cclib/cclib/pull/TODO)
- [ ] Merge the `release-1.8.1` branch back into master (https://github.com/cclib/cclib/pull/TODO)
- [ ] Send email to MLs

An example of a past finished release is 1.8.

  • Update the CHANGELOG, ANNOUNCE and any other files that might change content with the new version

  • Make sure that setup.py has the right version number, as well as __version__ in __init__.py and any other relevant files

  • Update the download and install instructions in the documentation, if appropriate

  • Create a branch for the release, so that development can continue

  • Run all tests for a final time and fix any remaining issues

  • Tag the release (make sure to use an annotated tag using git -a) and upload it (git push --tags)

  • Run manifest.py to update the MANIFEST file

  • Create the source distributions (python setup.py sdist --formats=gztar,zip) and Windows binary installers (python setup.py bdist_wininst)

  • Create a release on GitHub using the created tag (see Creating releases) and upload the source distributions and Windows binaries

  • Email the users and developers mailing list with the message in ANNOUNCE

  • Update the Python package index, normally done by python setup.py register

  • For significant releases, if appropriate, send an email to the CCL list and any mailing lists for computational chemistry packages supported by cclib

Testing

The test directory, which is not included in the default download, contains the test scripts that keep cclib reliable, and keep the developers sane. With any new commit or pull request to cclib on GitHub the tests are triggered and run with GitHub Actions.

The input files for tests, which are logfiles from computational chemistry programs, are located in the data directory. These are a central part of cclib, and any progress should always be supported by corresponding tests. When a user opens an issue or reports a bug, it is prudent to write a test that reproduces the bug as well as fixing it. This ensures it will remain fixed in the future. Likewise, extending the coverage of data attributes to more programs should proceed in parallel with the growth of unit tests.

Unit tests

Unit tests check that the parsers work correctly for typical calculation types on small molecules, usually water or 1,4-divinylbenzene (dvb) with \(C_{\mathrm{2h}}\) symmetry. The corresponding logfiles stored in folders like data/NWChem/basicNWChem6.0 are intended to test logfiles for an approximate major version of a program, and are standardized for all supported programs to the extent possible. They are located alongside the code in the repository, but are not normally distributed with the source. Attributes are considered supported only if they are checked by at least one test, and the table of attribute coverage is generated automatically using this criterion.

The job types currently included as unit tests:

  • restricted and unrestricted single point energies for dvb (RHF/STO-3G and B3LYP/STO-3G)

  • geometry optimization and scan for dvb (RHF/STO-3G and/or B3LYP/STO-3G)

  • frequency calculation with IR intensities and Raman activities for dvb (RHF/STO-3G or B3LYP/STO-3G)

  • single point energy for carbon atom using a large basis set such as aug-cc-pCVQZ

  • Møller–Plesset and coupled cluster energies for water (STO-3G basis set)

  • static polarizabilities for tryptophan (RHF/STO-3G)

In addition to the above unit tests for data, there are also unit tests for each bridge, calculation method, IO format, and helper utilities, all located inside the tests directory, with each category receiving its own subdirectory.

Adding a new attribute

Definitions of attributes (mocoeffs, natom, etc.) are located inside the ccdata class. Use existing attributes for guidance.

  1. Add a line containing the attribute name, a short description of the attribute, the type and shape (if not a scalar quantity) of the attribute, and relevant units to the docstring.

  2. Add an entry for the code representation of an attribute.

  3. Some attributes require additional processing into certain container or data types; available processing rules and their descriptions are below the attribute entries.

Without these modifications, saving the parsed attribute will appear to work inside the parser, but will be filtered out by ccData.setattributes before the ccData instance is returned. (This also means that arbitrary attributes can be set on and used from self inside a parser and they will be automatically cleaned up.)

Once the above is complete, and the new attribute is parsed and saved inside at least one parser, a new unit test should be added.

Adding a new unit test

Navigate to the relevant subdirectory of the tests directory. All filenames containing unit tests must start with test. Generally, each file containing an implementation in the cclib source has a matching test file. The exception is parsers, for which there are some program-specific tests, but most relevant are the data tests that are grouped by attribute.

Examples of how unit tests are written are those for population methods or the MOLDEN writer.

  • A class whose name ends in Test is used to hold test methods. Many test files only contain a single test class, but others contain multiple, usually specialized for a specific program or method. An example is having a basic PopulationTest but more specific GaussianBickelhauptTest and GaussianMPA classes for checking the results specific to Bickelhaupt and Mulliken population analyses.

  • Each method in a test class is meant for testing a single logical piece of functionality. Common checks are for whether or not the dimensions of calculated quantities are consistent and for certain chemical or physical invariants to hold, such as the total charge from a population analysis summing to the total formal charge of a system.

Adding a unit test for a new attribute or new methods on an existing data unit test class requires all of the above with the addition of:

  • An entry in the testdata file that matches the output for a program at a specific version with the test class the output should be used with. An output may be used with multiple tests and a test may be used for many different outputs: there are no restrictions.

  • Each method should, after self, take an argument called data that corresponds to a parsed ccData instance.

    • data is a pytest fixture; other test classes may have their own local fixtures defined. All cclib-specific but general fixtures are located in the pytest runtime configuration.

Adding a new program version

There are a few conventions when adding a new supported program version to the unit tests: * Two different recent versions are typically used in the unit tests. If there already are two, move the older version(s) the regression suite (see below). * When adding files for the new version, first copy the corresponding files for the last version already in cclib. Afterwards, check in files from the new program version as changes to the copied files. This procedure makes it easy to look at the differences introduced with the new version in git clients.

Regression tests

Regression tests ensure that bugs, once fixed, stay fixed. These are real-life files that at some point broke a cclib parser and are stored in folders like data/regression/Jaguar/Jaguar6.4. The files associated with regression tests are not stored together with the source code as they are often quite large. A separate repository on GitHub, cclib-data, is used to track these files, and we do not distribute them with any releases.

For every bug found in the parsers, there should be a corresponding regression test that tests if this bug stays fixed. The process is automated by regression.py, which runs through all of our test data, both the basic data and regression files, opens them, tries to parse, and runs any relevant regression tests defined for that file.

Using both the unit and regression tests, the line-by-line test coverage shows which parts of cclib are touched by at least one test. When adding new features and tests, the GitHub Actions testing script can be run locally to generate the HTML coverage pages and ensure that the tests exercise the feature code.

Adding a new regression test

A regression test consists of one or more output files and optionally a test function or class.

New regression tests are added by creating entries in regressionfiles.yaml. There are three kinds of tests:

  • A regression may be parsed, but specific attributes on the regression are not checked: no test function or class is added.

  • A regression may be parsed and also explicitly tested.

  • A regression may be explicitly tested but not parsed (this is uncommon).

More details, such as where to place regression data, how to control parsing, and what to name the optional tests are available in the pytest config and at the top of regressionfiles.yaml.

Code conventions

  • All aspects of code formatting are handled automatically by a combination of isort and ruff. Formatting is enforced by running pre-commit on all PRs. We encourage contributors to also run pre-commit locally. * Non-functional changes to code are ideally in separate PRs. This makes PRs quicker to review and merge.

  • Print output is controlled via Python’s standard logging library and levels. Any output that might be presentable to the user, from detecting that there may have been a problem with the calculation to general debug-type printing, should use the logger with an appropriate log level. Be generous with logging rather than erring on the side of caution. See this issue for historical information.

  • Setting parsed attributes inside a parser’s extract method is fundamentally identical to setting attributes on a basic Python object (self.myattr = thing).

Documentation

All new functions should contain a docstring in NumPy style. A docstring should focus on the user-facing interaction and purpose of a function rather than any implementation details. Additional code comments may also be necessary; here are some general guidelines for writing code comments.

Larger features, such as adding a new parser, method, or bridge, may also warrant additional high-level documentation in these pages. Please reach out to us about this if you have questions, or we may ask for some as part of discussion on an issue or PR.

Developers

Besides input from a number of people listed in the repository, the following are core developers (in alphabetical order):