Ich verwende GridSearch
from sklearn
, um die Parameter des Klassifikators zu optimieren. Da viele Daten vorhanden sind, dauert der gesamte Optimierungsprozess eine Weile: mehr als einen Tag. Ich möchte die Leistung der bereits erprobten Parameterkombinationen während der Ausführung beobachten. Ist es möglich?
python
logging
scikit-learn
Zweifel
quelle
quelle
Antworten:
Stellen Sie den
verbose
ParameterGridSearchCV
auf eine positive Zahl ein (je größer die Zahl, desto detaillierter werden Sie). Zum Beispiel:GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)
quelle
Ich möchte nur die Antwort von DavidS ergänzen
Um Ihnen eine Idee zu geben, sieht es für einen sehr einfachen Fall folgendermaßen aus
verbose=1
:Fitting 10 folds for each of 1 candidates, totalling 10 fits [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
Und so sieht es aus mit
verbose=10
:Fitting 10 folds for each of 1 candidates, totalling 10 fits [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 7.1s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.0s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total= 6.5s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.5s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.5s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 20.0s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.7s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 26.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total= 7.9s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 34.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total= 6.9s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 41.6s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total= 7.1s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 7 out of 7 | elapsed: 48.7s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total= 7.2s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 55.9s remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total= 6.6s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 [Parallel(n_jobs=1)]: Done 9 out of 9 | elapsed: 1.0min remaining: 0.0s [CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total= 6.6s [Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
In meinem Fall
verbose=1
macht der Trick.quelle
Schauen Sie sich die GridSearchCVProgressBar an
Ich habe es gerade gefunden und benutze es. Sehr hinein:
In [1]: GridSearchCVProgressBar Out[1]: pactools.grid_search.GridSearchCVProgressBar In [2]: In [2]: ??GridSearchCVProgressBar Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn') Source: class GridSearchCVProgressBar(model_selection.GridSearchCV): """Monkey patch Parallel to have a progress bar during grid search""" def _get_param_iterator(self): """Return ParameterGrid instance for the given param_grid""" iterator = super(GridSearchCVProgressBar, self)._get_param_iterator() iterator = list(iterator) n_candidates = len(iterator) cv = model_selection._split.check_cv(self.cv, None) n_splits = getattr(cv, 'n_splits', 3) max_value = n_candidates * n_splits class ParallelProgressBar(Parallel): def __call__(self, iterable): bar = ProgressBar(max_value=max_value, title='GridSearchCV') iterable = bar(iterable) return super(ParallelProgressBar, self).__call__(iterable) # Monkey patch model_selection._search.Parallel = ParallelProgressBar return iterator File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py Type: ABCMeta In [3]: ?GridSearchCVProgressBar Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn') Docstring: Monkey patch Parallel to have a progress bar during grid search File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py Type: ABCMeta
quelle