Data Science/Machine Learning

Classification 4. 앙상블 학습(Ensemble Learning) - Boosting(3. XGBoost)

HJChung 2020. 10. 10. 10:10

권철민 강사님의 '파이썬 머신러닝 완벽 가이드'을 학습하고 정리한 것입니다. 배우는 중이라 잘못된  내용이 있을 수 있으며 계속해서 보완해 나갈 것입니다. :)) 

3.XGBoost(eXtra Gradient Boost)

XGBoost는 앞선 GBM의 단점을 보충하면서도 여러 강점들을 가지고 있다. 

 

XGBoost의 주요 장점)

- 분류 뿐만 아니라 회귀에서도 뛰어난 예측 성능

- CPU 병렬처리, GPU 지원 등으로 GBM 대비 빠른 수행시간을 가짐

- Regularization 기능, Tree Pruninng등 다양항 성능 향상 기능 지원

- 다양한 편의 기능 (Early Stopping, 자체 내장된 교차 검증, 결손값 자체 처리)

 

※ XGBoost 의 Early Stopping 기능

 XGBoost와 LightGBM은 early stopping기능이 있다. 

early stopping은 지정해준 n_estimate 수만큼 반복하지 않더라도 일정 횟수에서 더이상 cost function 값(예측 오류)가 감소하지 않으면 

중단시켜버리는 것을 말한다. 

sklearn Wrapper XGBoost의 early stopping 사용 및 parameter)

.fit(X, y, sample_weight=None, base_margin=None, eval_set=None, 
	eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None,
    sample_weight_eval_set=None, feature_weights=None, callbacks=None)¶

 

- early_stopping_rounds: 더 이상 비용 평가 지표가 감소하지 않는 최대 반복횟수

- eval_metric: 반복 수행 시 사용하는 비용 평가 지표

- eval_set: 평가를 수행하는 별도의 검증 데이터 세트. 일반적으로 검증 데이터 세트에서 반복적으로 비용감소 성능 평가

 

XGBoost의 사용) python Wrapper와 Sklearn Wrapper

xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

www.kaggle.com/stuarthallows/using-xgboost-with-scikit-learn

 

Using XGBoost with Scikit-learn

Explore and run machine learning code with Kaggle Notebooks | Using data from no data sources

www.kaggle.com

python Wrapper XGBoost, sklearn Wrapper XGBoost )

blog.naver.com/PostView.nhn?blogId=gustn3964&logNo=221431714122&from=search&redirect=Log&widgetTypeCall=true&directAccess=false

XGBoost은 hyper parameter가 너무 많다. 그래서 hyper parameter 튜닝이 너무 어려울텐데, 다행이도 뛰어난 알고리즘(이 알고리즘ㅎㅎ)일 수록 파라미터를 튜닝할 필요가 적고, 파라미터 튜닝에 들이는 수고 대비 성능 향상 효과가 높지 않다. 그래서, <파이썬 머신러닝 완벽 가이드>에서는 overfitting 문제가 너무 심할 때 다음 5가지만 적용해보기를 추천하고 있다. 

- eta(learning_rate) 값을 낮춘다. (0.01~0.1) 그리고 eta(learning_rate)값을 낮추는 경우 num_boost_rounds(n_estimators) 는 반대로 높여준다. 

- max_depth 값을 낮춘다. 

- min_child_weight 값을 높인다. 

- gamma 값을 높인다. 

- sub_sample(subsample), colsample_bytree 를 조정해서 트리가 복잡해지는 것을 막는다. 

 

sklearn Wrapper XGBoost  코드)

python Wrapper가 아닌 sklearn Wrapper를 사용하고, early-stopping기능을 활용하여 앞에서 계속 해온 위스콘신 유방암 데이터 분석을 해보도록 한다. 

※ early stopping 에 관련된 parameter는

- early_stopping_rounds: 더 이상 비용 평가 지표가 감소하지 않는 최대 반복횟수

- eval_metric: 반복 수행 시 사용하는 비용 평가 지표

- eval_set: 평가를 수행하는 별도의 검증 데이터 세트. 일반적으로 검증 데이터 세트에서 반복적으로 비용감소 성능 평가

였다. 그래서 이 early_stopping_rounds 값을 어떻게 적절하게 줄 수 있을 지도 고민해봐야 하는 사항이긴 하다. 

 
In [1]:
import pandas as pd

from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
In [2]:
cancer_dataset = load_breast_cancer()

cancer_dataset_df = pd.DataFrame(cancer_dataset.data, columns=cancer_dataset.feature_names)
cancer_dataset_df.head()
Out[2]:
  mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678

5 rows × 30 columns

In [3]:
# dataset split
X_train, X_test, y_train, y_test = train_test_split(cancer_dataset.data, cancer_dataset.target, test_size=0.2, random_state=121)

print(X_train.shape , X_test.shape)
 
(455, 30) (114, 30)
In [4]:
# 사이킷런 래퍼 XGBoost 클래스인 XGBClassifier 임포트
from xgboost import XGBClassifier

evals = [(X_test, y_test)]

xgb_wrapper = XGBClassifier(n_estimators=400, learning_rate=0.1, max_depth=3)
xgb_wrapper.fit(X_train , y_train,  early_stopping_rounds=100,eval_set=evals, eval_metric="logloss",  verbose=True)

y_preds = xgb_wrapper.predict(X_test)
y_pred_proba = xgb_wrapper.predict_proba(X_test)[:, 1]
 
[0]	validation_0-logloss:0.61074
Will train until validation_0-logloss hasn't improved in 100 rounds.
[1]	validation_0-logloss:0.54330
[2]	validation_0-logloss:0.48703
[3]	validation_0-logloss:0.43807
[4]	validation_0-logloss:0.39739
[5]	validation_0-logloss:0.36164
[6]	validation_0-logloss:0.33155
[7]	validation_0-logloss:0.30455
[8]	validation_0-logloss:0.28063
[9]	validation_0-logloss:0.25836
[10]	validation_0-logloss:0.23880
[11]	validation_0-logloss:0.21861
[12]	validation_0-logloss:0.20191
[13]	validation_0-logloss:0.18760
[14]	validation_0-logloss:0.17434
[15]	validation_0-logloss:0.16118
[16]	validation_0-logloss:0.14883
[17]	validation_0-logloss:0.13820
[18]	validation_0-logloss:0.13136
[19]	validation_0-logloss:0.12219
[20]	validation_0-logloss:0.11408
[21]	validation_0-logloss:0.10782
[22]	validation_0-logloss:0.10078
[23]	validation_0-logloss:0.09542
[24]	validation_0-logloss:0.09040
[25]	validation_0-logloss:0.08715
[26]	validation_0-logloss:0.08375
[27]	validation_0-logloss:0.07934
[28]	validation_0-logloss:0.07633
[29]	validation_0-logloss:0.07368
[30]	validation_0-logloss:0.07056
[31]	validation_0-logloss:0.06706
[32]	validation_0-logloss:0.06486
[33]	validation_0-logloss:0.06255
[34]	validation_0-logloss:0.06007
[35]	validation_0-logloss:0.05809
[36]	validation_0-logloss:0.05577
[37]	validation_0-logloss:0.05402
[38]	validation_0-logloss:0.05296
[39]	validation_0-logloss:0.05194
[40]	validation_0-logloss:0.05024
[41]	validation_0-logloss:0.04879
[42]	validation_0-logloss:0.04721
[43]	validation_0-logloss:0.04607
[44]	validation_0-logloss:0.04505
[45]	validation_0-logloss:0.04387
[46]	validation_0-logloss:0.04328
[47]	validation_0-logloss:0.04204
[48]	validation_0-logloss:0.04086
[49]	validation_0-logloss:0.03995
[50]	validation_0-logloss:0.03946
[51]	validation_0-logloss:0.03838
[52]	validation_0-logloss:0.03745
[53]	validation_0-logloss:0.03716
[54]	validation_0-logloss:0.03687
[55]	validation_0-logloss:0.03652
[56]	validation_0-logloss:0.03558
[57]	validation_0-logloss:0.03491
[58]	validation_0-logloss:0.03406
[59]	validation_0-logloss:0.03326
[60]	validation_0-logloss:0.03269
[61]	validation_0-logloss:0.03256
[62]	validation_0-logloss:0.03221
[63]	validation_0-logloss:0.03173
[64]	validation_0-logloss:0.03135
[65]	validation_0-logloss:0.03059
[66]	validation_0-logloss:0.03009
[67]	validation_0-logloss:0.02941
[68]	validation_0-logloss:0.02865
[69]	validation_0-logloss:0.02839
[70]	validation_0-logloss:0.02807
[71]	validation_0-logloss:0.02755
[72]	validation_0-logloss:0.02716
[73]	validation_0-logloss:0.02710
[74]	validation_0-logloss:0.02690
[75]	validation_0-logloss:0.02673
[76]	validation_0-logloss:0.02636
[77]	validation_0-logloss:0.02565
[78]	validation_0-logloss:0.02537
[79]	validation_0-logloss:0.02516
[80]	validation_0-logloss:0.02508
[81]	validation_0-logloss:0.02474
[82]	validation_0-logloss:0.02471
[83]	validation_0-logloss:0.02433
[84]	validation_0-logloss:0.02440
[85]	validation_0-logloss:0.02409
[86]	validation_0-logloss:0.02381
[87]	validation_0-logloss:0.02349
[88]	validation_0-logloss:0.02322
[89]	validation_0-logloss:0.02299
[90]	validation_0-logloss:0.02274
[91]	validation_0-logloss:0.02260
[92]	validation_0-logloss:0.02257
[93]	validation_0-logloss:0.02236
[94]	validation_0-logloss:0.02191
[95]	validation_0-logloss:0.02180
[96]	validation_0-logloss:0.02161
[97]	validation_0-logloss:0.02150
[98]	validation_0-logloss:0.02149
[99]	validation_0-logloss:0.02131
[100]	validation_0-logloss:0.02119
[101]	validation_0-logloss:0.02099
[102]	validation_0-logloss:0.02089
[103]	validation_0-logloss:0.02078
[104]	validation_0-logloss:0.02065
[105]	validation_0-logloss:0.02064
[106]	validation_0-logloss:0.02037
[107]	validation_0-logloss:0.02040
[108]	validation_0-logloss:0.02031
[109]	validation_0-logloss:0.02015
[110]	validation_0-logloss:0.01997
[111]	validation_0-logloss:0.01983
[112]	validation_0-logloss:0.01974
[113]	validation_0-logloss:0.01957
[114]	validation_0-logloss:0.01943
[115]	validation_0-logloss:0.01934
[116]	validation_0-logloss:0.01910
[117]	validation_0-logloss:0.01888
[118]	validation_0-logloss:0.01892
[119]	validation_0-logloss:0.01893
[120]	validation_0-logloss:0.01871
[121]	validation_0-logloss:0.01871
[122]	validation_0-logloss:0.01855
[123]	validation_0-logloss:0.01848
[124]	validation_0-logloss:0.01823
[125]	validation_0-logloss:0.01816
[126]	validation_0-logloss:0.01807
[127]	validation_0-logloss:0.01788
[128]	validation_0-logloss:0.01793
[129]	validation_0-logloss:0.01784
[130]	validation_0-logloss:0.01782
[131]	validation_0-logloss:0.01763
[132]	validation_0-logloss:0.01762
[133]	validation_0-logloss:0.01764
[134]	validation_0-logloss:0.01740
[135]	validation_0-logloss:0.01732
[136]	validation_0-logloss:0.01723
[137]	validation_0-logloss:0.01720
[138]	validation_0-logloss:0.01714
[139]	validation_0-logloss:0.01702
[140]	validation_0-logloss:0.01693
[141]	validation_0-logloss:0.01695
[142]	validation_0-logloss:0.01694
[143]	validation_0-logloss:0.01683
[144]	validation_0-logloss:0.01671
[145]	validation_0-logloss:0.01669
[146]	validation_0-logloss:0.01665
[147]	validation_0-logloss:0.01655
[148]	validation_0-logloss:0.01644
[149]	validation_0-logloss:0.01640
[150]	validation_0-logloss:0.01632
[151]	validation_0-logloss:0.01625
[152]	validation_0-logloss:0.01629
[153]	validation_0-logloss:0.01634
[154]	validation_0-logloss:0.01633
[155]	validation_0-logloss:0.01624
[156]	validation_0-logloss:0.01614
[157]	validation_0-logloss:0.01596
[158]	validation_0-logloss:0.01598
[159]	validation_0-logloss:0.01590
[160]	validation_0-logloss:0.01591
[161]	validation_0-logloss:0.01583
[162]	validation_0-logloss:0.01593
[163]	validation_0-logloss:0.01586
[164]	validation_0-logloss:0.01572
[165]	validation_0-logloss:0.01564
[166]	validation_0-logloss:0.01554
[167]	validation_0-logloss:0.01551
[168]	validation_0-logloss:0.01540
[169]	validation_0-logloss:0.01533
[170]	validation_0-logloss:0.01529
[171]	validation_0-logloss:0.01531
[172]	validation_0-logloss:0.01522
[173]	validation_0-logloss:0.01512
[174]	validation_0-logloss:0.01519
[175]	validation_0-logloss:0.01502
[176]	validation_0-logloss:0.01504
[177]	validation_0-logloss:0.01497
[178]	validation_0-logloss:0.01485
[179]	validation_0-logloss:0.01487
[180]	validation_0-logloss:0.01479
[181]	validation_0-logloss:0.01487
[182]	validation_0-logloss:0.01493
[183]	validation_0-logloss:0.01484
[184]	validation_0-logloss:0.01478
[185]	validation_0-logloss:0.01479
[186]	validation_0-logloss:0.01487
[187]	validation_0-logloss:0.01480
[188]	validation_0-logloss:0.01468
[189]	validation_0-logloss:0.01474
[190]	validation_0-logloss:0.01475
[191]	validation_0-logloss:0.01469
[192]	validation_0-logloss:0.01477
[193]	validation_0-logloss:0.01468
[194]	validation_0-logloss:0.01463
[195]	validation_0-logloss:0.01457
[196]	validation_0-logloss:0.01447
[197]	validation_0-logloss:0.01446
[198]	validation_0-logloss:0.01453
[199]	validation_0-logloss:0.01459
[200]	validation_0-logloss:0.01452
[201]	validation_0-logloss:0.01448
[202]	validation_0-logloss:0.01440
[203]	validation_0-logloss:0.01447
[204]	validation_0-logloss:0.01446
[205]	validation_0-logloss:0.01440
[206]	validation_0-logloss:0.01434
[207]	validation_0-logloss:0.01435
[208]	validation_0-logloss:0.01434
[209]	validation_0-logloss:0.01427
[210]	validation_0-logloss:0.01434
[211]	validation_0-logloss:0.01433
[212]	validation_0-logloss:0.01427
[213]	validation_0-logloss:0.01419
[214]	validation_0-logloss:0.01415
[215]	validation_0-logloss:0.01422
[216]	validation_0-logloss:0.01421
[217]	validation_0-logloss:0.01414
[218]	validation_0-logloss:0.01415
[219]	validation_0-logloss:0.01414
[220]	validation_0-logloss:0.01408
[221]	validation_0-logloss:0.01413
[222]	validation_0-logloss:0.01410
[223]	validation_0-logloss:0.01409
[224]	validation_0-logloss:0.01409
[225]	validation_0-logloss:0.01411
[226]	validation_0-logloss:0.01405
[227]	validation_0-logloss:0.01401
[228]	validation_0-logloss:0.01398
[229]	validation_0-logloss:0.01400
[230]	validation_0-logloss:0.01394
[231]	validation_0-logloss:0.01394
[232]	validation_0-logloss:0.01393
[233]	validation_0-logloss:0.01388
[234]	validation_0-logloss:0.01385
[235]	validation_0-logloss:0.01386
[236]	validation_0-logloss:0.01377
[237]	validation_0-logloss:0.01373
[238]	validation_0-logloss:0.01370
[239]	validation_0-logloss:0.01375
[240]	validation_0-logloss:0.01370
[241]	validation_0-logloss:0.01366
[242]	validation_0-logloss:0.01363
[243]	validation_0-logloss:0.01368
[244]	validation_0-logloss:0.01367
[245]	validation_0-logloss:0.01373
[246]	validation_0-logloss:0.01366
[247]	validation_0-logloss:0.01362
[248]	validation_0-logloss:0.01363
[249]	validation_0-logloss:0.01360
[250]	validation_0-logloss:0.01355
[251]	validation_0-logloss:0.01357
[252]	validation_0-logloss:0.01352
[253]	validation_0-logloss:0.01352
[254]	validation_0-logloss:0.01352
[255]	validation_0-logloss:0.01347
[256]	validation_0-logloss:0.01344
[257]	validation_0-logloss:0.01340
[258]	validation_0-logloss:0.01342
[259]	validation_0-logloss:0.01341
[260]	validation_0-logloss:0.01335
[261]	validation_0-logloss:0.01330
[262]	validation_0-logloss:0.01328
[263]	validation_0-logloss:0.01333
[264]	validation_0-logloss:0.01330
[265]	validation_0-logloss:0.01327
[266]	validation_0-logloss:0.01327
[267]	validation_0-logloss:0.01327
[268]	validation_0-logloss:0.01325
[269]	validation_0-logloss:0.01327
[270]	validation_0-logloss:0.01332
[271]	validation_0-logloss:0.01328
[272]	validation_0-logloss:0.01323
[273]	validation_0-logloss:0.01320
[274]	validation_0-logloss:0.01319
[275]	validation_0-logloss:0.01321
[276]	validation_0-logloss:0.01318
[277]	validation_0-logloss:0.01313
[278]	validation_0-logloss:0.01315
[279]	validation_0-logloss:0.01312
[280]	validation_0-logloss:0.01312
[281]	validation_0-logloss:0.01313
[282]	validation_0-logloss:0.01311
[283]	validation_0-logloss:0.01307
[284]	validation_0-logloss:0.01312
[285]	validation_0-logloss:0.01310
[286]	validation_0-logloss:0.01303
[287]	validation_0-logloss:0.01309
[288]	validation_0-logloss:0.01306
[289]	validation_0-logloss:0.01301
[290]	validation_0-logloss:0.01301
[291]	validation_0-logloss:0.01298
[292]	validation_0-logloss:0.01294
[293]	validation_0-logloss:0.01292
[294]	validation_0-logloss:0.01294
[295]	validation_0-logloss:0.01292
[296]	validation_0-logloss:0.01289
[297]	validation_0-logloss:0.01294
[298]	validation_0-logloss:0.01292
[299]	validation_0-logloss:0.01286
[300]	validation_0-logloss:0.01287
[301]	validation_0-logloss:0.01287
[302]	validation_0-logloss:0.01282
[303]	validation_0-logloss:0.01279
[304]	validation_0-logloss:0.01275
[305]	validation_0-logloss:0.01273
[306]	validation_0-logloss:0.01270
[307]	validation_0-logloss:0.01275
[308]	validation_0-logloss:0.01275
[309]	validation_0-logloss:0.01271
[310]	validation_0-logloss:0.01268
[311]	validation_0-logloss:0.01270
[312]	validation_0-logloss:0.01264
[313]	validation_0-logloss:0.01261
[314]	validation_0-logloss:0.01259
[315]	validation_0-logloss:0.01257
[316]	validation_0-logloss:0.01256
[317]	validation_0-logloss:0.01252
[318]	validation_0-logloss:0.01254
[319]	validation_0-logloss:0.01252
[320]	validation_0-logloss:0.01246
[321]	validation_0-logloss:0.01251
[322]	validation_0-logloss:0.01249
[323]	validation_0-logloss:0.01245
[324]	validation_0-logloss:0.01242
[325]	validation_0-logloss:0.01241
[326]	validation_0-logloss:0.01239
[327]	validation_0-logloss:0.01235
[328]	validation_0-logloss:0.01240
[329]	validation_0-logloss:0.01237
[330]	validation_0-logloss:0.01235
[331]	validation_0-logloss:0.01232
[332]	validation_0-logloss:0.01231
[333]	validation_0-logloss:0.01227
[334]	validation_0-logloss:0.01225
[335]	validation_0-logloss:0.01227
[336]	validation_0-logloss:0.01225
[337]	validation_0-logloss:0.01224
[338]	validation_0-logloss:0.01224
[339]	validation_0-logloss:0.01228
[340]	validation_0-logloss:0.01226
[341]	validation_0-logloss:0.01222
[342]	validation_0-logloss:0.01222
[343]	validation_0-logloss:0.01222
[344]	validation_0-logloss:0.01220
[345]	validation_0-logloss:0.01222
[346]	validation_0-logloss:0.01221
[347]	validation_0-logloss:0.01219
[348]	validation_0-logloss:0.01217
[349]	validation_0-logloss:0.01221
[350]	validation_0-logloss:0.01218
[351]	validation_0-logloss:0.01214
[352]	validation_0-logloss:0.01212
[353]	validation_0-logloss:0.01209
[354]	validation_0-logloss:0.01209
[355]	validation_0-logloss:0.01207
[356]	validation_0-logloss:0.01204
[357]	validation_0-logloss:0.01204
[358]	validation_0-logloss:0.01203
[359]	validation_0-logloss:0.01200
[360]	validation_0-logloss:0.01200
[361]	validation_0-logloss:0.01199
[362]	validation_0-logloss:0.01198
[363]	validation_0-logloss:0.01200
[364]	validation_0-logloss:0.01198
[365]	validation_0-logloss:0.01198
[366]	validation_0-logloss:0.01194
[367]	validation_0-logloss:0.01193
[368]	validation_0-logloss:0.01192
[369]	validation_0-logloss:0.01192
[370]	validation_0-logloss:0.01190
[371]	validation_0-logloss:0.01192
[372]	validation_0-logloss:0.01189
[373]	validation_0-logloss:0.01187
[374]	validation_0-logloss:0.01187
[375]	validation_0-logloss:0.01186
[376]	validation_0-logloss:0.01185
[377]	validation_0-logloss:0.01182
[378]	validation_0-logloss:0.01184
[379]	validation_0-logloss:0.01182
[380]	validation_0-logloss:0.01179
[381]	validation_0-logloss:0.01177
[382]	validation_0-logloss:0.01176
[383]	validation_0-logloss:0.01176
[384]	validation_0-logloss:0.01173
[385]	validation_0-logloss:0.01172
[386]	validation_0-logloss:0.01172
[387]	validation_0-logloss:0.01171
[388]	validation_0-logloss:0.01172
[389]	validation_0-logloss:0.01170
[390]	validation_0-logloss:0.01171
[391]	validation_0-logloss:0.01168
[392]	validation_0-logloss:0.01166
[393]	validation_0-logloss:0.01164
[394]	validation_0-logloss:0.01162
[395]	validation_0-logloss:0.01162
[396]	validation_0-logloss:0.01161
[397]	validation_0-logloss:0.01161
[398]	validation_0-logloss:0.01158
[399]	validation_0-logloss:0.01155
In [5]:
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import f1_score, roc_auc_score

# 수정된 get_clf_eval() 함수 
def get_clf_eval(y_test, pred=None, pred_proba=None):
    confusion = confusion_matrix( y_test, pred)
    accuracy = accuracy_score(y_test , pred)
    precision = precision_score(y_test , pred)
    recall = recall_score(y_test , pred)
    f1 = f1_score(y_test,pred)
    # ROC-AUC 추가 
    roc_auc = roc_auc_score(y_test, pred_proba)
    print('오차 행렬')
    print(confusion)
    # ROC-AUC print 추가
    print('정확도: {0:.4f}, 정밀도: {1:.4f}, 재현율: {2:.4f},\
    F1: {3:.4f}, AUC:{4:.4f}'.format(accuracy, precision, recall, f1, roc_auc))
In [6]:
get_clf_eval(y_test, y_preds, y_pred_proba)
 
오차 행렬
[[44  0]
 [ 0 70]]
정확도: 1.0000, 정밀도: 1.0000, 재현율: 1.0000,    F1: 1.0000, AUC:1.0000
In [7]:
#feature 중요도도 그려볼 수 있다. 
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 12))
# 사이킷런 래퍼 클래스를 입력해도 무방. 
plot_importance(xgb_wrapper, ax=ax)
Out[7]:
<AxesSubplot:title={'center':'Feature importance'}, xlabel='F score', ylabel='Features'>
 
In [ ]:
 

 

흠.. 근데 내 코드에서는 왜  early stopping이 되지 않고, 결과도 정확도: 1.0000, 정밀도: 1.0000, 재현율: 1.0000,    F1: 1.0000, AUC:1.0000 이렇게 나올까..? 좀 더 이것 저것을 해봐야겠다.  
아무튼 사용 방법은 이렇다! 는 것을 배웠다. 

Reference

파이썬 머신러닝 완벽 가이드 - 권철민 저