Ich verwende GridSearch
from sklearn
, um die Parameter des Klassifikators zu optimieren. Da viele Daten vorhanden sind, dauert der gesamte Optimierungsprozess eine Weile: mehr als einen Tag. Ich möchte die Leistung der bereits erprobten Parameterkombinationen während der Ausführung beobachten. Ist es möglich?
Stellen Sie den
auf eine positive Zahl ein (je größer die Zahl, desto detaillierter werden Sie). Zum Beispiel:GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)
Ich möchte nur die Antwort von DavidS ergänzen
Um Ihnen eine Idee zu geben, sieht es für einen sehr einfachen Fall folgendermaßen aus
:Fitting 10 folds for each of 1 candidates, totalling 10 fits [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
Und so sieht es aus mit
:Fitting 10 folds for each of 1 candidates, totalling 10 fits [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 7.1s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.0s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total= 6.5s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.5s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.5s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 20.0s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.7s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 26.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total= 7.9s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 34.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total= 6.9s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 41.6s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total= 7.1s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 7 out of 7 | elapsed: 48.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total= 7.2s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 55.9s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total= 6.6s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 9 out of 9 | elapsed: 1.0min remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total= 6.6s [Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
In meinem Fall
macht der Trick.quelle
Schauen Sie sich die GridSearchCVProgressBar an
Ich habe es gerade gefunden und benutze es. Sehr hinein:
In [1]: GridSearchCVProgressBar Out[1]: pactools.grid_search.GridSearchCVProgressBar In [2]: In [2]: ??GridSearchCVProgressBar Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn') Source: class GridSearchCVProgressBar(model_selection.GridSearchCV): """Monkey patch Parallel to have a progress bar during grid search""" def _get_param_iterator(self): """Return ParameterGrid instance for the given param_grid""" iterator = super(GridSearchCVProgressBar, self)._get_param_iterator() iterator = list(iterator) n_candidates = len(iterator) cv = model_selection._split.check_cv(self.cv, None) n_splits = getattr(cv, 'n_splits', 3) max_value = n_candidates * n_splits class ParallelProgressBar(Parallel): def __call__(self, iterable): bar = ProgressBar(max_value=max_value, title='GridSearchCV') iterable = bar(iterable) return super(ParallelProgressBar, self).__call__(iterable) # Monkey patch model_selection._search.Parallel = ParallelProgressBar return iterator File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py Type: ABCMeta In [3]: ?GridSearchCVProgressBar Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn') Docstring: Monkey patch Parallel to have a progress bar during grid search File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py Type: ABCMeta