"This talk introduces the Parabel algorithm for extreme multi-label learning where the objective is to learn classifiers that can annotate a data point with the most relevant subset of labels from an ex- tremely large label set. The state-of-the-art DiSMEC and PPDSparse algorithms are the most accurate but take weeks for training and prediction as they learn and apply an independent linear classifier per label. Consequently, they do not scale to large datasets with millions of labels. Parabel addresses these limitations by: (a) cutting down the training time to a few hours on a single core of a standard desktop by learning a hierarchy of coarse to fine label classifiers, each trained on a small subset of datapoints and (b) by cutting down the prediction time to a few milliseconds per test point by leveraging the classifier hierarchy for logarithmic time prediction in number of labels. This allows Parabel to scale to tasks consid- ered infeasible for DiSMEC and PPDSparse such as predicting the subset of search engine queries that might lead to a click on a given ad-landing page for dynamic search advertising. Experimental results demonstrated that Parabel could train in 80 hours on a proprietary dataset with 7 million labels which is beyond the scale of both DiSMEC and PPDSparse. Results on some of the largest publically available datasets revealed that Parabel could be 1,000x faster at training than both DiSMEC and PPDSparse, as well as 10,000x and 40x faster at prediction than DiSMEC and PPDSparse without any significant loss in prediction accuracy. Moreover, Para- bel was also found to be much more accurate than other tree based extreme classifiers and could be more than 10x faster at training with a 10x smaller model. Finally, Parabel was demonstrated to significantly improve dynamic search advertising on Bing by more than doubling the ad coverage, as well as improving the click- through rate by 20%"