由于内容偏多,我将《基于回归算法预测股市价格》实验项目拆分成了多篇文章。这是该系列的第 四 篇文章!这里主要介绍树回归模型的训练过程等内容。
相关文章链接:
- 第一篇:基于回归算法预测股市价格(Part 1)
- 第二篇:基于回归算法预测股市价格(Part 2)
- 第三篇:基于回归算法预测股市价格(Part 3: Estimating with Linear Regression)
- 第四篇:基于回归算法预测股市价格(Part 4: Estimating with Tree-based Regression)(正在阅读)
- 第五篇:基于回归算法预测股市价格(Part 5: Estimating with Network-based Regression)
Implementing Regression with Tree-based Model
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
pipeline = Pipeline(
[
("scaler", StandardScaler()),
(
"model",
"passthrough",
), # model placeholder, to be replaced by parameter grid later
]
)
param_grid = [
# Candidate Model 1: Decision Tree Regression (No Standardization
{
"model": [DecisionTreeRegressor()],
"scaler": ["passthrough"],
"model__max_depth": [None, 5, 10],
"model__min_samples_split": [2, 5, 10],
},
# Candidate Model 2: Random Forest Regression (No Standardization Required)
{
"model": [RandomForestRegressor()],
"scaler": ["passthrough"],
"model__n_estimators": [50, 100],
"model__max_depth": [None, 10],
"model__min_samples_split": [2, 5],
},
# Candidate Model 3: Gradient Boosting Regression (No Standardization Required)
{
"model": [GradientBoostingRegressor()],
"scaler": ["passthrough"],
"model__n_estimators": [50, 100],
"model__learning_rate": [0.01, 0.1],
"model__max_depth": [3, 5],
},
]
tscv = TimeSeriesSplit(n_splits=3)
grid_search = GridSearchCV(
estimator=pipeline,
param_grid=param_grid,
cv=tscv,
scoring="neg_mean_squared_error",
n_jobs=-1,
)
grid_search.fit(X_train_transformed, y_train_aligned)
# Print the best model and hyperparameters
print("Best model:", grid_search.best_estimator_.named_steps["model"])
print("Best parameters:", grid_search.best_params_)
# Evaluate the model on test sets
final_model = grid_search.best_estimator_
test_metrics = evaluate_model(final_model, X_test_transformed, y_test_aligned, "Test")
结果为:
Best model: RandomForestRegressor(n_estimators=50)
Best parameters: {'model': RandomForestRegressor(), 'model__max_depth': None, 'model__min_samples_split': 2, 'model__n_estimators': 50, 'scaler': 'passthrough'}
Test Set Performance:
================================================================================
Metric Value Unit
--------------------------------------------------------------------------------
Mean Squared Error (MSE) 4.0841e+00
Root MSE (RMSE) 2.0209 Price Units
Mean Absolute Error (MAE) 1.3821 Price Units
Mean Abs Percentage Error (MAPE) 0.94 %
R-squared (R2) 0.9927
Directional Accuracy 72.59 %