1.13.6. Feature selection as part of a pipeline
Feature selection is usually used as a pre-processing step before doing the actual learning.
The recommended way to do this in scikit-learn is to use a Pipeline:
clf = Pipeline([ ('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))), ('classification', RandomForestClassifier()) ]) clf.fit(X, y)
clf = Pipeline([
('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))),
('classification', RandomForestClassifier())
])
clf.fit(X, y)
In this snippet we make use of a LinearSVC coupled with SelectFromModel to evaluate feature importances and select the most relevant features.
Then, a RandomForestClassifier is trained on the transformed output, i.e. using only relevant features.
You can perform similar operations with the other feature selection methods and also classifiers that provide a way to evaluate feature importances of course.
See the Pipeline examples for more details.
댓글