This content originally appeared on DEV Community and was authored by Henry Lin

第3章：监督学习模型

学习目标

通过本章学习，您将能够：

理解量化投资中监督学习的基本原理
掌握传统机器学习模型在量化投资中的应用
熟悉深度学习模型的特点和使用方法
学会使用Qlib中的各种预训练模型
理解高级模型技术的原理和应用

3.1 传统机器学习模型

3.1.1 LightGBM模型原理与实践

LightGBM简介

LightGBM（Light Gradient Boosting Machine）是微软开发的一个高效的梯度提升框架，在量化投资中表现优异。

主要特点：

高效性：基于直方图算法，训练速度快
内存友好：内存占用低，支持大数据集
准确性：在多个基准测试中表现优秀
可解释性：提供特征重要性分析

模型原理

1. 梯度提升决策树（GBDT）

# GBDT基本原理
class GBDT:
    def __init__(self, n_estimators=100, learning_rate=0.1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.trees = []

    def fit(self, X, y):
        # 初始化预测值
        predictions = np.zeros(len(y))

        for i in range(self.n_estimators):
            # 计算残差
            residuals = y - predictions

            # 训练决策树
            tree = DecisionTreeRegressor(max_depth=6)
            tree.fit(X, residuals)

            # 更新预测值
            predictions += self.learning_rate * tree.predict(X)
            self.trees.append(tree)

    def predict(self, X):
        predictions = np.zeros(len(X))
        for tree in self.trees:
            predictions += self.learning_rate * tree.predict(X)
        return predictions

2. LightGBM优化技术

直方图算法：将连续特征离散化，减少内存使用
叶子优先策略：优先分裂叶子节点，减少过拟合
类别特征优化：直接支持类别特征，无需独热编码

Qlib中的LightGBM实现

1. 基础使用

from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158

# 准备数据
handler = Alpha158(
    instruments='csi300',
    start_time='2020-01-01',
    end_time='2020-12-31',
    freq='day'
)

# 获取数据
data = handler.fetch(
    segments={
        'train': ('2020-01-01', '2020-06-30'),
        'valid': ('2020-07-01', '2020-09-30'),
        'test': ('2020-10-01', '2020-12-31')
    }
)

# 创建模型
model = LGBModel(
    loss='mse',
    colsample_bytree=0.8879,
    learning_rate=0.2,
    subsample=0.8789,
    n_estimators=100,
    max_depth=8,
    num_leaves=210,
    min_child_samples=20
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 模型配置优化

# 优化配置
model_config = {
    'loss': 'mse',                    # 损失函数
    'colsample_bytree': 0.8879,      # 特征采样比例
    'learning_rate': 0.2,            # 学习率
    'subsample': 0.8789,             # 样本采样比例
    'n_estimators': 100,             # 树的数量
    'max_depth': 8,                  # 最大深度
    'num_leaves': 210,               # 叶子节点数
    'min_child_samples': 20,         # 最小样本数
    'reg_alpha': 0.001,              # L1正则化
    'reg_lambda': 0.001,             # L2正则化
    'random_state': 42               # 随机种子
}

model = LGBModel(**model_config)

3. 特征重要性分析

# 获取特征重要性
feature_importance = model.feature_importance()

# 可视化特征重要性
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))
feature_importance.sort_values(ascending=True).plot(kind='barh')
plt.title('Feature Importance')
plt.xlabel('Importance')
plt.tight_layout()
plt.show()

3.1.2 XGBoost模型应用

XGBoost简介

XGBoost（eXtreme Gradient Boosting）是一个优化的分布式梯度提升库，在机器学习竞赛中表现优异。

主要特点：

正则化：内置L1和L2正则化，防止过拟合
并行计算：支持多线程并行训练
处理缺失值：自动处理缺失值
交叉验证：内置交叉验证功能

模型实现

1. 基础实现

from qlib.contrib.model.gbdt import XGBModel

# 创建XGBoost模型
model = XGBModel(
    max_depth=6,
    learning_rate=0.1,
    n_estimators=100,
    objective='reg:squarederror',
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42
)

# 训练模型
model.fit(
    data['train']['feature'],
    data['train']['label']['LABEL0'],
    eval_set=[(data['valid']['feature'], data['valid']['label']['LABEL0'])],
    early_stopping_rounds=10,
    verbose=False
)

2. 超参数优化

from sklearn.model_selection import GridSearchCV

# 定义参数网格
param_grid = {
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'subsample': [0.8, 0.9, 1.0],
    'colsample_bytree': [0.8, 0.9, 1.0]
}

# 网格搜索
grid_search = GridSearchCV(
    XGBModel(),
    param_grid,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1
)

grid_search.fit(data['train']['feature'], data['train']['label']['LABEL0'])
print("最佳参数:", grid_search.best_params_)

3.1.3 CatBoost模型使用

CatBoost简介

CatBoost是Yandex开发的梯度提升库，特别适合处理类别特征。

主要特点：

类别特征处理：原生支持类别特征
过拟合控制：内置过拟合检测
预测质量：在多个基准测试中表现优秀
易用性：参数相对较少，易于调优

模型实现

1. 基础使用

from qlib.contrib.model.gbdt import CatBoostModel

# 创建CatBoost模型
model = CatBoostModel(
    iterations=100,
    learning_rate=0.1,
    depth=6,
    l2_leaf_reg=3,
    loss_function='RMSE',
    random_seed=42
)

# 训练模型
model.fit(
    data['train']['feature'],
    data['train']['label']['LABEL0'],
    eval_set=(data['valid']['feature'], data['valid']['label']['LABEL0']),
    verbose=False
)

2. 类别特征处理

# 识别类别特征
categorical_features = data['train']['feature'].select_dtypes(include=['object']).columns

# 创建模型时指定类别特征
model = CatBoostModel(
    iterations=100,
    learning_rate=0.1,
    depth=6,
    cat_features=categorical_features.tolist()
)

3.1.4 线性模型和集成方法

线性回归模型

1. 基础线性回归

from sklearn.linear_model import LinearRegression
from qlib.contrib.model.linear import LinearModel

# 创建线性模型
model = LinearModel(
    fit_intercept=True,
    normalize=False
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

# 预测
predictions = model.predict(data['test']['feature'])

2. 正则化线性模型

from sklearn.linear_model import Ridge, Lasso, ElasticNet

# Ridge回归（L2正则化）
ridge_model = Ridge(alpha=1.0)

# Lasso回归（L1正则化）
lasso_model = Lasso(alpha=0.1)

# ElasticNet（L1+L2正则化）
elastic_model = ElasticNet(alpha=0.1, l1_ratio=0.5)

集成方法

1. 随机森林

from sklearn.ensemble import RandomForestRegressor

# 创建随机森林模型
rf_model = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42
)

# 训练模型
rf_model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 投票回归器

from sklearn.ensemble import VotingRegressor

# 创建多个基础模型
models = [
    ('lgb', LGBModel()),
    ('xgb', XGBModel()),
    ('cat', CatBoostModel())
]

# 创建投票回归器
voting_regressor = VotingRegressor(estimators=models)

# 训练模型
voting_regressor.fit(data['train']['feature'], data['train']['label']['LABEL0'])

3.2 深度学习模型

3.2.1 MLP神经网络模型

MLP简介

多层感知机（MLP）是最基础的深度学习模型，在量化投资中也有广泛应用。

主要特点：

非线性建模：通过激活函数引入非线性
特征学习：自动学习特征表示
灵活性：可以调整网络结构
可解释性：相对容易理解

模型实现

1. 基础MLP模型

import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import MLPModel

# 定义MLP网络结构
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=3):
        super(MLP, self).__init__()

        layers = []
        layers.append(nn.Linear(input_size, hidden_size))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(0.2))

        for _ in range(num_layers - 2):
            layers.append(nn.Linear(hidden_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(0.2))

        layers.append(nn.Linear(hidden_size, 1))

        self.network = nn.Sequential(*layers)

    def forward(self, x):
        return self.network(x)

# 创建模型
model = MLPModel(
    model=MLP(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 高级MLP配置

# 高级配置
model_config = {
    'model': MLP(input_size=data['train']['feature'].shape[1], hidden_size=256),
    'optimizer': 'adam',
    'loss': 'mse',
    'lr': 0.001,
    'max_epochs': 200,
    'batch_size': 512,
    'early_stop': 20,
    'lr_scheduler': 'step',
    'lr_step_size': 50,
    'lr_gamma': 0.5
}

model = MLPModel(**model_config)

3.2.2 LSTM/GRU时序模型

时序模型简介

LSTM（Long Short-Term Memory）和GRU（Gated Recurrent Unit）是专门处理时序数据的循环神经网络。

主要特点：

记忆机制：能够记住长期依赖关系
门控机制：控制信息流动
序列建模：适合处理时间序列数据
梯度消失控制：有效解决梯度消失问题

LSTM模型实现

1. 基础LSTM模型

import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import LSTMModel

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(LSTM, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        lstm_out, _ = self.lstm(x)

        # 取最后一个时间步的输出
        last_output = lstm_out[:, -1, :]

        # 全连接层
        output = self.fc(last_output)
        return output

# 创建模型
model = LSTMModel(
    model=LSTM(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

2. GRU模型实现

class GRU(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(GRU, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.gru = nn.GRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        gru_out, _ = self.gru(x)

        # 取最后一个时间步的输出
        last_output = gru_out[:, -1, :]

        # 全连接层
        output = self.fc(last_output)
        return output

# 创建GRU模型
model = LSTMModel(  # 可以复用LSTM的模型接口
    model=GRU(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.2.3 Transformer模型应用

Transformer简介

Transformer是基于注意力机制的深度学习模型，在自然语言处理领域取得巨大成功，近年来在量化投资中也得到应用。

主要特点：

注意力机制：能够关注重要的时间步
并行计算：支持并行训练
长距离依赖：能够处理长序列依赖
可扩展性：易于扩展和优化

模型实现

1. 基础Transformer模型

import torch
import torch.nn as nn
import math

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=5000):
        super(PositionalEncoding, self).__init__()

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)

        self.register_buffer('pe', pe)

    def forward(self, x):
        return x + self.pe[:x.size(0), :]

class Transformer(nn.Module):
    def __init__(self, input_size, d_model=128, nhead=8, num_layers=6, dropout=0.1):
        super(Transformer, self).__init__()

        self.input_projection = nn.Linear(input_size, d_model)
        self.pos_encoder = PositionalEncoding(d_model)

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=nhead,
            dim_feedforward=d_model * 4,
            dropout=dropout
        )

        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers)
        self.fc = nn.Linear(d_model, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        x = self.input_projection(x)  # (batch_size, seq_len, d_model)
        x = x.transpose(0, 1)  # (seq_len, batch_size, d_model)
        x = self.pos_encoder(x)
        x = self.transformer_encoder(x)
        x = x.transpose(0, 1)  # (batch_size, seq_len, d_model)

        # 取最后一个时间步的输出
        last_output = x[:, -1, :]
        output = self.fc(last_output)
        return output

# 创建Transformer模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=Transformer(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.0001,
    max_epochs=100,
    batch_size=128
)

3.2.4 图神经网络(GATs)模型

图神经网络简介

图注意力网络（GATs）是图神经网络的一种，能够处理股票之间的复杂关系。

主要特点：

图结构建模：能够建模股票间的关系
注意力机制：动态学习股票间的重要性
关系建模：考虑股票间的相互影响
可解释性：能够解释股票间的关系

模型实现

1. 图注意力层

import torch
import torch.nn as nn
import torch.nn.functional as F

class GraphAttentionLayer(nn.Module):
    def __init__(self, in_features, out_features, dropout=0.1, alpha=0.2):
        super(GraphAttentionLayer, self).__init__()

        self.in_features = in_features
        self.out_features = out_features
        self.dropout = dropout
        self.alpha = alpha

        self.W = nn.Linear(in_features, out_features, bias=False)
        self.a = nn.Linear(2 * out_features, 1, bias=False)

        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def forward(self, input, adj):
        # input: (N, in_features)
        # adj: (N, N)

        Wh = self.W(input)  # (N, out_features)

        # 计算注意力系数
        a_input = torch.cat([Wh.repeat_interleave(Wh.size(0), dim=0), 
                            Wh.repeat(Wh.size(0), 1)], dim=1)
        a_input = a_input.view(Wh.size(0), Wh.size(0), 2 * self.out_features)
        e = self.leakyrelu(self.a(a_input).squeeze(2))

        # 掩码处理
        zero_vec = -9e15 * torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)
        attention = F.softmax(attention, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)

        h_prime = torch.matmul(attention, Wh)
        return h_prime

class GAT(nn.Module):
    def __init__(self, input_size, hidden_size=64, num_heads=8, dropout=0.1):
        super(GAT, self).__init__()

        self.dropout = dropout
        self.attention_layers = nn.ModuleList([
            GraphAttentionLayer(input_size, hidden_size, dropout, alpha=0.2)
            for _ in range(num_heads)
        ])

        self.out_att = GraphAttentionLayer(hidden_size * num_heads, 1, dropout, alpha=0.2)

    def forward(self, x, adj):
        # x: (N, input_size)
        # adj: (N, N)

        # 多头注意力
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attention_layers], dim=1)
        x = F.elu(x)

        # 输出层
        x = F.dropout(x, self.dropout, training=self.training)
        x = self.out_att(x, adj)

        return x

# 创建GAT模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=GAT(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.3 高级模型技术

3.3.1 注意力机制模型(SFM, ALSTM)

注意力机制简介

注意力机制允许模型关注输入序列中的重要部分，在量化投资中能够识别关键的时间点和特征。

主要特点：

动态权重：根据输入动态分配权重
可解释性：能够解释模型的关注点
长距离依赖：能够处理长序列依赖
灵活性：可以应用于多种模型

SFM模型实现

SFM（State Frequency Memory）是一种专门用于时间序列预测的注意力模型。

class SFM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_frequencies=10):
        super(SFM, self).__init__()

        self.hidden_size = hidden_size
        self.num_frequencies = num_frequencies

        # 状态记忆模块
        self.state_memory = nn.Linear(input_size, hidden_size)

        # 频率记忆模块
        self.frequency_memory = nn.Linear(input_size, num_frequencies)

        # 注意力机制
        self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)

        # 输出层
        self.output_layer = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        batch_size, seq_len, _ = x.size()

        # 状态记忆
        state_memory = self.state_memory(x)  # (batch_size, seq_len, hidden_size)

        # 频率记忆
        frequency_memory = self.frequency_memory(x)  # (batch_size, seq_len, num_frequencies)

        # 注意力机制
        state_memory = state_memory.transpose(0, 1)  # (seq_len, batch_size, hidden_size)
        attended_output, _ = self.attention(state_memory, state_memory, state_memory)
        attended_output = attended_output.transpose(0, 1)  # (batch_size, seq_len, hidden_size)

        # 结合频率信息
        frequency_weights = F.softmax(frequency_memory, dim=-1)
        weighted_output = attended_output * frequency_weights.unsqueeze(-1).expand_as(attended_output)

        # 输出层
        output = self.output_layer(weighted_output[:, -1, :])  # 取最后一个时间步
        return output

ALSTM模型实现

ALSTM（Attention LSTM）结合了LSTM和注意力机制。

class ALSTM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(ALSTM, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # LSTM层
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        # 注意力层
        self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)

        # 输出层
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)

        # LSTM处理
        lstm_out, _ = self.lstm(x)  # (batch_size, seq_len, hidden_size)

        # 注意力机制
        lstm_out = lstm_out.transpose(0, 1)  # (seq_len, batch_size, hidden_size)
        attended_output, _ = self.attention(lstm_out, lstm_out, lstm_out)
        attended_output = attended_output.transpose(0, 1)  # (batch_size, seq_len, hidden_size)

        # 输出层
        last_output = attended_output[:, -1, :]  # 取最后一个时间步
        output = self.fc(last_output)
        return output

3.3.2 时间卷积网络(TCN)

TCN简介

时间卷积网络（TCN）使用因果卷积处理时间序列数据，具有并行计算和长距离依赖的优势。

主要特点：

因果卷积：确保不会使用未来信息
并行计算：支持并行训练
长距离依赖：通过扩张卷积处理长序列
残差连接：缓解梯度消失问题

模型实现

class TemporalBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, dilation, padding, dropout=0.2):
        super(TemporalBlock, self).__init__()

        self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size,
                               stride=stride, padding=padding, dilation=dilation)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout)

        self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size,
                               stride=stride, padding=padding, dilation=dilation)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout)

        self.net = nn.Sequential(self.conv1, self.relu1, self.dropout1,
                                self.conv2, self.relu2, self.dropout2)

        self.downsample = nn.Conv1d(in_channels, out_channels, 1) if in_channels != out_channels else None
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.net(x)
        res = x if self.downsample is None else self.downsample(x)
        return self.relu(out + res)

class TCN(nn.Module):
    def __init__(self, input_size, num_channels, kernel_size=2, dropout=0.2):
        super(TCN, self).__init__()

        layers = []
        num_levels = len(num_channels)

        for i in range(num_levels):
            dilation_size = 2 ** i
            in_channels = input_size if i == 0 else num_channels[i-1]
            out_channels = num_channels[i]

            layers.append(
                TemporalBlock(in_channels, out_channels, kernel_size, stride=1,
                            dilation=dilation_size, padding=(kernel_size-1) * dilation_size,
                            dropout=dropout)
            )

        self.network = nn.Sequential(*layers)
        self.fc = nn.Linear(num_channels[-1], 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        x = x.transpose(1, 2)  # (batch_size, input_size, seq_len)
        x = self.network(x)
        x = x.transpose(1, 2)  # (batch_size, seq_len, num_channels[-1])

        # 取最后一个时间步
        last_output = x[:, -1, :]
        output = self.fc(last_output)
        return output

# 创建TCN模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=TCN(input_size=data['train']['feature'].shape[1], num_channels=[64, 128, 256]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.3.3 自适应模型(ADARNN, ADD)

自适应模型简介

自适应模型能够根据数据分布的变化自动调整模型参数，在量化投资中特别有用。

主要特点：

动态适应：能够适应市场变化
在线学习：支持增量学习
概念漂移处理：能够处理数据分布变化
鲁棒性：对异常数据更鲁棒

ADARNN模型实现

ADARNN（Adaptive RNN）是一种能够自适应调整的循环神经网络。

class ADARNN(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(ADARNN, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # 自适应LSTM层
        self.lstm_layers = nn.ModuleList([
            nn.LSTM(input_size if i == 0 else hidden_size, hidden_size, 1, dropout=dropout)
            for i in range(num_layers)
        ])

        # 自适应门控机制
        self.adaptive_gates = nn.ModuleList([
            nn.Linear(hidden_size, hidden_size)
            for _ in range(num_layers)
        ])

        # 输出层
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        batch_size, seq_len, _ = x.size()

        # 初始化隐藏状态
        h = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        c = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)

        outputs = []

        for t in range(seq_len):
            layer_input = x[:, t, :].unsqueeze(1)  # (batch_size, 1, input_size)

            for layer_idx in range(self.num_layers):
                # LSTM处理
                lstm_out, (h[layer_idx], c[layer_idx]) = self.lstm_layers[layer_idx](
                    layer_input, (h[layer_idx], c[layer_idx])
                )

                # 自适应门控
                gate = torch.sigmoid(self.adaptive_gates[layer_idx](lstm_out))
                lstm_out = lstm_out * gate

                layer_input = lstm_out

            outputs.append(lstm_out.squeeze(1))

        # 堆叠所有时间步的输出
        outputs = torch.stack(outputs, dim=1)  # (batch_size, seq_len, hidden_size)

        # 输出层
        last_output = outputs[:, -1, :]  # 取最后一个时间步
        output = self.fc(last_output)
        return output

ADD模型实现

ADD（Adaptive Deep Network）是一种自适应深度网络。

class ADD(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=3, dropout=0.2):
        super(ADD, self).__init__()

        layers = []
        layers.append(nn.Linear(input_size, hidden_size))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(dropout))

        for _ in range(num_layers - 2):
            layers.append(nn.Linear(hidden_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout))

        layers.append(nn.Linear(hidden_size, 1))

        self.network = nn.Sequential(*layers)

        # 自适应参数
        self.adaptive_weight = nn.Parameter(torch.ones(1))
        self.adaptive_bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        # 基础网络输出
        base_output = self.network(x)

        # 自适应调整
        adaptive_output = base_output * self.adaptive_weight + self.adaptive_bias

        return adaptive_output

3.3.4 多任务学习模型

多任务学习简介

多任务学习同时学习多个相关任务，在量化投资中可以同时预测多个目标。

主要特点：

知识共享：不同任务间共享知识
正则化效果：减少过拟合
效率提升：同时学习多个任务
泛化能力：提高模型泛化能力

模型实现

class MultiTaskModel(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_tasks=3):
        super(MultiTaskModel, self).__init__()

        # 共享特征提取层
        self.shared_layers = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.2)
        )

        # 任务特定层
        self.task_specific_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(hidden_size, hidden_size // 2),
                nn.ReLU(),
                nn.Dropout(0.2),
                nn.Linear(hidden_size // 2, 1)
            )
            for _ in range(num_tasks)
        ])

    def forward(self, x):
        # 共享特征提取
        shared_features = self.shared_layers(x)

        # 任务特定预测
        task_outputs = []
        for task_layer in self.task_specific_layers:
            task_output = task_layer(shared_features)
            task_outputs.append(task_output)

        return task_outputs

# 创建多任务模型
model = MultiTaskModel(
    input_size=data['train']['feature'].shape[1],
    hidden_size=128,
    num_tasks=3  # 预测3个不同的目标
)

# 多任务损失函数
def multi_task_loss(predictions, targets, weights=None):
    """多任务损失函数"""
    if weights is None:
        weights = [1.0] * len(predictions)

    total_loss = 0
    for pred, target, weight in zip(predictions, targets, weights):
        loss = F.mse_loss(pred, target)
        total_loss += weight * loss

    return total_loss

本章小结

本章详细介绍了Qlib中各种监督学习模型，包括：

传统机器学习模型：LightGBM、XGBoost、CatBoost等梯度提升模型
深度学习模型：MLP、LSTM/GRU、Transformer、GATs等神经网络模型
高级模型技术：注意力机制、时间卷积网络、自适应模型、多任务学习

课后练习

练习1：模型比较

使用不同模型训练Alpha158数据集
比较各模型的性能指标
分析模型的特点和适用场景

练习2：模型调优

对LightGBM模型进行超参数优化
使用网格搜索和贝叶斯优化
分析参数对模型性能的影响

练习3：特征工程

为深度学习模型设计特征
实现自定义的特征工程方法
分析特征对模型性能的影响

扩展阅读

机器学习理论
- 《统计学习方法》
- 《机器学习》
深度学习相关
- 《深度学习》
- 《动手学深度学习》
量化投资模型
- 《量化投资策略与技术》
- 《机器学习在量化投资中的应用》

This content originally appeared on DEV Community and was authored by Henry Lin

第3章：Qlib监督学习模型

第3章：监督学习模型

学习目标

3.1 传统机器学习模型

3.1.1 LightGBM模型原理与实践

LightGBM简介

模型原理

Qlib中的LightGBM实现

3.1.2 XGBoost模型应用

XGBoost简介

模型实现

3.1.3 CatBoost模型使用

CatBoost简介

模型实现

3.1.4 线性模型和集成方法

线性回归模型

集成方法

3.2 深度学习模型

3.2.1 MLP神经网络模型

MLP简介

模型实现

3.2.2 LSTM/GRU时序模型

时序模型简介

LSTM模型实现

3.2.3 Transformer模型应用

Transformer简介

模型实现

3.2.4 图神经网络(GATs)模型

图神经网络简介

模型实现

3.3 高级模型技术

3.3.1 注意力机制模型(SFM, ALSTM)

注意力机制简介

SFM模型实现

ALSTM模型实现

3.3.2 时间卷积网络(TCN)

TCN简介

模型实现

3.3.3 自适应模型(ADARNN, ADD)

自适应模型简介

ADARNN模型实现

ADD模型实现

3.3.4 多任务学习模型

多任务学习简介

模型实现

本章小结

课后练习

练习1：模型比较

练习2：模型调优

练习3：特征工程

扩展阅读