Skip to content


The metametric Python package offers a set of tools for quickly and easily defining and implementing evaluation metrics for a variety of structured prediction tasks in natural language processing (NLP) based on the framework presented in the following paper:

A Unified View of Evaluation Metrics for Structured Prediction. Yunmo Chen, William Gantt, Tongfei Chen, Aaron Steven White, and Benjamin Van Durme. EMNLP 2023.

The key features of the package include:

  • A decorator for automatically defining and implementing a custom metric for an arbitrary dataclass.
  • A collection of generic components for defining arbitrary new metrics based on the framework in the paper.
  • Implementations of a number of metrics for common structured prediction tasks.

To install, run:

pip install metametric


Scoring a Pair of Objects

metametric comes with a set of prebuilt metrics for common structured prediction tasks. For example, to compute the standard suite of evaluation metrics for coreference resolution — \(\text{MUC}\) (muc; paper), \(B^3\) [b_cubed_precision, b_cubed_recall; paper) and \(\text{CEAF}_{\phi_4}\) (ceaf_phi4; paper) — on a pair of predicted and reference coreference clusters, one can simply do:

from import Mention, Entity, EntitySet
from metametric.metrics.coref import coref_suite

scorer =
for p, r in zip(predicted_entities, reference_entities):
    scorer.update_single(p, r)

metrics = scorer.compute()

>> {"muc-f1": 0.4, "b_cubed-f1": 0.45, "ceaf_phi4-f1": 0.52, "avg-f1": 0.46, ...}

Defining a Metric

If you want to implement a new metric based on structure (and substructure) matching, you can leverage the built-in matching algorithms and just focus on the structure of interests. For example, if you want to implement s_match for Abstract Meaning Representation (AMR) parsing, you need only define dataclasses for the relevant structures (here, Prop for AMR propositions and AMR for AMR graphs), and then define the matching between them:

# Define structures
@dataclass(eq=True, frozen=True)
class Prop:
    """A Proposition in an AMR."""

    subj: Variable
    pred: str
    obj: Union[Variable, str]

class AMR:
    """Abstract Meaning Representation (AMR) structure."""

    props: Collection[Prop]

# Let metametric derive Smatch for you!
s_match = mm.normalize["f1"]([AMR])

# `s_match` is now ready to use on any pair of `AMR` graphs!


If you use this codebase (package) in your work, please cite the following paper:

    title={A Unified View of Evaluation Metrics for Structured Prediction},
    author={Yunmo Chen and William Gantt and Tongfei Chen and Aaron Steven White and Benjamin {Van Durme}},
    booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},