ML & DL/Machine Learning

scikit learn pipeline

uni2237 2021. 2. 2.
728x90
728x90

1.13.6. Feature selection as part of a pipeline

Feature selection is usually used as a pre-processing step before doing the actual learning.

The recommended way to do this in scikit-learn is to use a Pipeline:

clf = Pipeline([ ('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))), ('classification', RandomForestClassifier()) ]) clf.fit(X, y)

clf = Pipeline([ 
   ('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))), 
   ('classification', RandomForestClassifier()) 
    ]) 
clf.fit(X, y)

In this snippet we make use of a LinearSVC coupled with SelectFromModel to evaluate feature importances and select the most relevant features.

Then, a RandomForestClassifier is trained on the transformed output, i.e. using only relevant features.

You can perform similar operations with the other feature selection methods and also classifiers that provide a way to evaluate feature importances of course.

See the Pipeline examples for more details.

728x90

댓글