"This talk introduces the Parabel algorithm for extreme multi-label
learning where the objective is to learn classifiers that can annotate
a data point with the most relevant subset of labels from an ex-
tremely large label set. The state-of-the-art DiSMEC and PPDSparse
algorithms are the most accurate but take weeks for training and
prediction as they learn and apply an independent linear classifier
per label. Consequently, they do not scale to large datasets with
millions of labels. Parabel addresses these limitations by: (a) cutting
down the training time to a few hours on a single core of a standard
desktop by learning a hierarchy of coarse to fine label classifiers,
each trained on a small subset of datapoints and (b) by cutting
down the prediction time to a few milliseconds per test point by
leveraging the classifier hierarchy for logarithmic time prediction
in number of labels. This allows Parabel to scale to tasks consid-
ered infeasible for DiSMEC and PPDSparse such as predicting the
subset of search engine queries that might lead to a click on a given
ad-landing page for dynamic search advertising.
Experimental results demonstrated that Parabel could train in 80
hours on a proprietary dataset with 7 million labels which is beyond
the scale of both DiSMEC and PPDSparse. Results on some of the
largest publically available datasets revealed that Parabel could be
1,000x faster at training than both DiSMEC and PPDSparse, as well
as 10,000x and 40x faster at prediction than DiSMEC and PPDSparse
without any significant loss in prediction accuracy. Moreover, Para-
bel was also found to be much more accurate than other tree based
extreme classifiers and could be more than 10x faster at training
with a 10x smaller model. Finally, Parabel was demonstrated to
significantly improve dynamic search advertising on Bing by more
than doubling the ad coverage, as well as improving the click-
through rate by 20%"