benchmark¶
compare ¶
The compare facade: evaluate methods on a task battery and report significance.
compare ¶
compare(
methods,
data,
task="classification",
*,
num_classes=None,
predict_fn=None,
metrics=None,
prob_metrics=None,
test="wilcoxon",
alpha=0.05,
ignore_index=None,
device=None,
)
Compare methods on a standard battery and report significance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
str
|
|
'classification'
|
num_classes
|
int or None
|
Required when |
None
|
ignore_index
|
int or None
|
Label to exclude from segmentation metrics (e.g. a void/boundary class). |
None
|
prob_metrics
|
frozenset[str] or None
|
Metrics whose names need probabilities; defaults to the task's set. |
None
|
BenchmarkResult
dataclass
¶
Holds benchmark results.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
Dataset
|
Dims |
comparisons |
DataFrame
|
Pairwise significance results (see |
alpha |
float
|
Significance level used. |
summary ¶
Publication-ready table: per method/metric mean and CI, with a
"*" marker when the method differs significantly from reference
(default: the first method in data).