基于回归算法预测股市价格(Part 4)

Hakuna 2025-02-27 2025-02-28 452 字 3 minutes Regression

由于内容偏多,我将《基于回归算法预测股市价格》实验项目拆分成了多篇文章。这是该系列的第 篇文章!这里主要介绍树回归模型的训练过程等内容。

相关文章链接


Implementing Regression with Tree-based Model

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor

pipeline = Pipeline(
    [
        ("scaler", StandardScaler()),
        (
            "model",
            "passthrough",
        ),  # model placeholder, to be replaced by parameter grid later
    ]
)

param_grid = [
    # Candidate Model 1: Decision Tree Regression (No Standardization
    {
        "model": [DecisionTreeRegressor()],
        "scaler": ["passthrough"],
        "model__max_depth": [None, 5, 10],
        "model__min_samples_split": [2, 5, 10],
    },
    # Candidate Model 2: Random Forest Regression (No Standardization Required)
    {
        "model": [RandomForestRegressor()],
        "scaler": ["passthrough"],
        "model__n_estimators": [50, 100],
        "model__max_depth": [None, 10],
        "model__min_samples_split": [2, 5],
    },
    # Candidate Model 3: Gradient Boosting Regression (No Standardization Required)
    {
        "model": [GradientBoostingRegressor()],
        "scaler": ["passthrough"],
        "model__n_estimators": [50, 100],
        "model__learning_rate": [0.01, 0.1],
        "model__max_depth": [3, 5],
    },
]

tscv = TimeSeriesSplit(n_splits=3)

grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=tscv,
    scoring="neg_mean_squared_error",
    n_jobs=-1,
)

grid_search.fit(X_train_transformed, y_train_aligned)

# Print the best model and hyperparameters
print("Best model:", grid_search.best_estimator_.named_steps["model"])
print("Best parameters:", grid_search.best_params_)

# Evaluate the model on test sets
final_model = grid_search.best_estimator_
test_metrics = evaluate_model(final_model, X_test_transformed, y_test_aligned, "Test")

结果为:

Best model: RandomForestRegressor(n_estimators=50)
Best parameters: {'model': RandomForestRegressor(), 'model__max_depth': None, 'model__min_samples_split': 2, 'model__n_estimators': 50, 'scaler': 'passthrough'}

Test Set Performance:
================================================================================
Metric                                            Value                Unit
--------------------------------------------------------------------------------
Mean Squared Error (MSE)                     4.0841e+00                    
Root MSE (RMSE)                                  2.0209         Price Units
Mean Absolute Error (MAE)                        1.3821         Price Units
Mean Abs Percentage Error (MAPE)                   0.94                   %
R-squared (R2)                                   0.9927                    
Directional Accuracy                             72.59                    %
Previous Next