第3章:Qlib监督学习模型



This content originally appeared on DEV Community and was authored by Henry Lin

第3章:监督学习模型

学习目标

通过本章学习,您将能够:

  • 理解量化投资中监督学习的基本原理
  • 掌握传统机器学习模型在量化投资中的应用
  • 熟悉深度学习模型的特点和使用方法
  • 学会使用Qlib中的各种预训练模型
  • 理解高级模型技术的原理和应用

3.1 传统机器学习模型

3.1.1 LightGBM模型原理与实践

LightGBM简介

LightGBM(Light Gradient Boosting Machine)是微软开发的一个高效的梯度提升框架,在量化投资中表现优异。

主要特点:

  • 高效性:基于直方图算法,训练速度快
  • 内存友好:内存占用低,支持大数据集
  • 准确性:在多个基准测试中表现优秀
  • 可解释性:提供特征重要性分析

模型原理

1. 梯度提升决策树(GBDT)

# GBDT基本原理
class GBDT:
    def __init__(self, n_estimators=100, learning_rate=0.1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.trees = []

    def fit(self, X, y):
        # 初始化预测值
        predictions = np.zeros(len(y))

        for i in range(self.n_estimators):
            # 计算残差
            residuals = y - predictions

            # 训练决策树
            tree = DecisionTreeRegressor(max_depth=6)
            tree.fit(X, residuals)

            # 更新预测值
            predictions += self.learning_rate * tree.predict(X)
            self.trees.append(tree)

    def predict(self, X):
        predictions = np.zeros(len(X))
        for tree in self.trees:
            predictions += self.learning_rate * tree.predict(X)
        return predictions

2. LightGBM优化技术

  • 直方图算法:将连续特征离散化,减少内存使用
  • 叶子优先策略:优先分裂叶子节点,减少过拟合
  • 类别特征优化:直接支持类别特征,无需独热编码

Qlib中的LightGBM实现

1. 基础使用

from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158

# 准备数据
handler = Alpha158(
    instruments='csi300',
    start_time='2020-01-01',
    end_time='2020-12-31',
    freq='day'
)

# 获取数据
data = handler.fetch(
    segments={
        'train': ('2020-01-01', '2020-06-30'),
        'valid': ('2020-07-01', '2020-09-30'),
        'test': ('2020-10-01', '2020-12-31')
    }
)

# 创建模型
model = LGBModel(
    loss='mse',
    colsample_bytree=0.8879,
    learning_rate=0.2,
    subsample=0.8789,
    n_estimators=100,
    max_depth=8,
    num_leaves=210,
    min_child_samples=20
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 模型配置优化

# 优化配置
model_config = {
    'loss': 'mse',                    # 损失函数
    'colsample_bytree': 0.8879,      # 特征采样比例
    'learning_rate': 0.2,            # 学习率
    'subsample': 0.8789,             # 样本采样比例
    'n_estimators': 100,             # 树的数量
    'max_depth': 8,                  # 最大深度
    'num_leaves': 210,               # 叶子节点数
    'min_child_samples': 20,         # 最小样本数
    'reg_alpha': 0.001,              # L1正则化
    'reg_lambda': 0.001,             # L2正则化
    'random_state': 42               # 随机种子
}

model = LGBModel(**model_config)

3. 特征重要性分析

# 获取特征重要性
feature_importance = model.feature_importance()

# 可视化特征重要性
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))
feature_importance.sort_values(ascending=True).plot(kind='barh')
plt.title('Feature Importance')
plt.xlabel('Importance')
plt.tight_layout()
plt.show()

3.1.2 XGBoost模型应用

XGBoost简介

XGBoost(eXtreme Gradient Boosting)是一个优化的分布式梯度提升库,在机器学习竞赛中表现优异。

主要特点:

  • 正则化:内置L1和L2正则化,防止过拟合
  • 并行计算:支持多线程并行训练
  • 处理缺失值:自动处理缺失值
  • 交叉验证:内置交叉验证功能

模型实现

1. 基础实现

from qlib.contrib.model.gbdt import XGBModel

# 创建XGBoost模型
model = XGBModel(
    max_depth=6,
    learning_rate=0.1,
    n_estimators=100,
    objective='reg:squarederror',
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42
)

# 训练模型
model.fit(
    data['train']['feature'],
    data['train']['label']['LABEL0'],
    eval_set=[(data['valid']['feature'], data['valid']['label']['LABEL0'])],
    early_stopping_rounds=10,
    verbose=False
)

2. 超参数优化

from sklearn.model_selection import GridSearchCV

# 定义参数网格
param_grid = {
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'subsample': [0.8, 0.9, 1.0],
    'colsample_bytree': [0.8, 0.9, 1.0]
}

# 网格搜索
grid_search = GridSearchCV(
    XGBModel(),
    param_grid,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1
)

grid_search.fit(data['train']['feature'], data['train']['label']['LABEL0'])
print("最佳参数:", grid_search.best_params_)

3.1.3 CatBoost模型使用

CatBoost简介

CatBoost是Yandex开发的梯度提升库,特别适合处理类别特征。

主要特点:

  • 类别特征处理:原生支持类别特征
  • 过拟合控制:内置过拟合检测
  • 预测质量:在多个基准测试中表现优秀
  • 易用性:参数相对较少,易于调优

模型实现

1. 基础使用

from qlib.contrib.model.gbdt import CatBoostModel

# 创建CatBoost模型
model = CatBoostModel(
    iterations=100,
    learning_rate=0.1,
    depth=6,
    l2_leaf_reg=3,
    loss_function='RMSE',
    random_seed=42
)

# 训练模型
model.fit(
    data['train']['feature'],
    data['train']['label']['LABEL0'],
    eval_set=(data['valid']['feature'], data['valid']['label']['LABEL0']),
    verbose=False
)

2. 类别特征处理

# 识别类别特征
categorical_features = data['train']['feature'].select_dtypes(include=['object']).columns

# 创建模型时指定类别特征
model = CatBoostModel(
    iterations=100,
    learning_rate=0.1,
    depth=6,
    cat_features=categorical_features.tolist()
)

3.1.4 线性模型和集成方法

线性回归模型

1. 基础线性回归

from sklearn.linear_model import LinearRegression
from qlib.contrib.model.linear import LinearModel

# 创建线性模型
model = LinearModel(
    fit_intercept=True,
    normalize=False
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

# 预测
predictions = model.predict(data['test']['feature'])

2. 正则化线性模型

from sklearn.linear_model import Ridge, Lasso, ElasticNet

# Ridge回归(L2正则化)
ridge_model = Ridge(alpha=1.0)

# Lasso回归(L1正则化)
lasso_model = Lasso(alpha=0.1)

# ElasticNet(L1+L2正则化)
elastic_model = ElasticNet(alpha=0.1, l1_ratio=0.5)

集成方法

1. 随机森林

from sklearn.ensemble import RandomForestRegressor

# 创建随机森林模型
rf_model = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42
)

# 训练模型
rf_model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 投票回归器

from sklearn.ensemble import VotingRegressor

# 创建多个基础模型
models = [
    ('lgb', LGBModel()),
    ('xgb', XGBModel()),
    ('cat', CatBoostModel())
]

# 创建投票回归器
voting_regressor = VotingRegressor(estimators=models)

# 训练模型
voting_regressor.fit(data['train']['feature'], data['train']['label']['LABEL0'])

3.2 深度学习模型

3.2.1 MLP神经网络模型

MLP简介

多层感知机(MLP)是最基础的深度学习模型,在量化投资中也有广泛应用。

主要特点:

  • 非线性建模:通过激活函数引入非线性
  • 特征学习:自动学习特征表示
  • 灵活性:可以调整网络结构
  • 可解释性:相对容易理解

模型实现

1. 基础MLP模型

import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import MLPModel

# 定义MLP网络结构
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=3):
        super(MLP, self).__init__()

        layers = []
        layers.append(nn.Linear(input_size, hidden_size))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(0.2))

        for _ in range(num_layers - 2):
            layers.append(nn.Linear(hidden_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(0.2))

        layers.append(nn.Linear(hidden_size, 1))

        self.network = nn.Sequential(*layers)

    def forward(self, x):
        return self.network(x)

# 创建模型
model = MLPModel(
    model=MLP(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])

2. 高级MLP配置

# 高级配置
model_config = {
    'model': MLP(input_size=data['train']['feature'].shape[1], hidden_size=256),
    'optimizer': 'adam',
    'loss': 'mse',
    'lr': 0.001,
    'max_epochs': 200,
    'batch_size': 512,
    'early_stop': 20,
    'lr_scheduler': 'step',
    'lr_step_size': 50,
    'lr_gamma': 0.5
}

model = MLPModel(**model_config)

3.2.2 LSTM/GRU时序模型

时序模型简介

LSTM(Long Short-Term Memory)GRU(Gated Recurrent Unit)是专门处理时序数据的循环神经网络。

主要特点:

  • 记忆机制:能够记住长期依赖关系
  • 门控机制:控制信息流动
  • 序列建模:适合处理时间序列数据
  • 梯度消失控制:有效解决梯度消失问题

LSTM模型实现

1. 基础LSTM模型

import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import LSTMModel

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(LSTM, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        lstm_out, _ = self.lstm(x)

        # 取最后一个时间步的输出
        last_output = lstm_out[:, -1, :]

        # 全连接层
        output = self.fc(last_output)
        return output

# 创建模型
model = LSTMModel(
    model=LSTM(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

2. GRU模型实现

class GRU(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(GRU, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.gru = nn.GRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        gru_out, _ = self.gru(x)

        # 取最后一个时间步的输出
        last_output = gru_out[:, -1, :]

        # 全连接层
        output = self.fc(last_output)
        return output

# 创建GRU模型
model = LSTMModel(  # 可以复用LSTM的模型接口
    model=GRU(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.2.3 Transformer模型应用

Transformer简介

Transformer是基于注意力机制的深度学习模型,在自然语言处理领域取得巨大成功,近年来在量化投资中也得到应用。

主要特点:

  • 注意力机制:能够关注重要的时间步
  • 并行计算:支持并行训练
  • 长距离依赖:能够处理长序列依赖
  • 可扩展性:易于扩展和优化

模型实现

1. 基础Transformer模型

import torch
import torch.nn as nn
import math

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=5000):
        super(PositionalEncoding, self).__init__()

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)

        self.register_buffer('pe', pe)

    def forward(self, x):
        return x + self.pe[:x.size(0), :]

class Transformer(nn.Module):
    def __init__(self, input_size, d_model=128, nhead=8, num_layers=6, dropout=0.1):
        super(Transformer, self).__init__()

        self.input_projection = nn.Linear(input_size, d_model)
        self.pos_encoder = PositionalEncoding(d_model)

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=nhead,
            dim_feedforward=d_model * 4,
            dropout=dropout
        )

        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers)
        self.fc = nn.Linear(d_model, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        x = self.input_projection(x)  # (batch_size, seq_len, d_model)
        x = x.transpose(0, 1)  # (seq_len, batch_size, d_model)
        x = self.pos_encoder(x)
        x = self.transformer_encoder(x)
        x = x.transpose(0, 1)  # (batch_size, seq_len, d_model)

        # 取最后一个时间步的输出
        last_output = x[:, -1, :]
        output = self.fc(last_output)
        return output

# 创建Transformer模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=Transformer(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.0001,
    max_epochs=100,
    batch_size=128
)

3.2.4 图神经网络(GATs)模型

图神经网络简介

图注意力网络(GATs)是图神经网络的一种,能够处理股票之间的复杂关系。

主要特点:

  • 图结构建模:能够建模股票间的关系
  • 注意力机制:动态学习股票间的重要性
  • 关系建模:考虑股票间的相互影响
  • 可解释性:能够解释股票间的关系

模型实现

1. 图注意力层

import torch
import torch.nn as nn
import torch.nn.functional as F

class GraphAttentionLayer(nn.Module):
    def __init__(self, in_features, out_features, dropout=0.1, alpha=0.2):
        super(GraphAttentionLayer, self).__init__()

        self.in_features = in_features
        self.out_features = out_features
        self.dropout = dropout
        self.alpha = alpha

        self.W = nn.Linear(in_features, out_features, bias=False)
        self.a = nn.Linear(2 * out_features, 1, bias=False)

        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def forward(self, input, adj):
        # input: (N, in_features)
        # adj: (N, N)

        Wh = self.W(input)  # (N, out_features)

        # 计算注意力系数
        a_input = torch.cat([Wh.repeat_interleave(Wh.size(0), dim=0), 
                            Wh.repeat(Wh.size(0), 1)], dim=1)
        a_input = a_input.view(Wh.size(0), Wh.size(0), 2 * self.out_features)
        e = self.leakyrelu(self.a(a_input).squeeze(2))

        # 掩码处理
        zero_vec = -9e15 * torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)
        attention = F.softmax(attention, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)

        h_prime = torch.matmul(attention, Wh)
        return h_prime

class GAT(nn.Module):
    def __init__(self, input_size, hidden_size=64, num_heads=8, dropout=0.1):
        super(GAT, self).__init__()

        self.dropout = dropout
        self.attention_layers = nn.ModuleList([
            GraphAttentionLayer(input_size, hidden_size, dropout, alpha=0.2)
            for _ in range(num_heads)
        ])

        self.out_att = GraphAttentionLayer(hidden_size * num_heads, 1, dropout, alpha=0.2)

    def forward(self, x, adj):
        # x: (N, input_size)
        # adj: (N, N)

        # 多头注意力
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attention_layers], dim=1)
        x = F.elu(x)

        # 输出层
        x = F.dropout(x, self.dropout, training=self.training)
        x = self.out_att(x, adj)

        return x

# 创建GAT模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=GAT(input_size=data['train']['feature'].shape[1]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.3 高级模型技术

3.3.1 注意力机制模型(SFM, ALSTM)

注意力机制简介

注意力机制允许模型关注输入序列中的重要部分,在量化投资中能够识别关键的时间点和特征。

主要特点:

  • 动态权重:根据输入动态分配权重
  • 可解释性:能够解释模型的关注点
  • 长距离依赖:能够处理长序列依赖
  • 灵活性:可以应用于多种模型

SFM模型实现

SFM(State Frequency Memory)是一种专门用于时间序列预测的注意力模型。

class SFM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_frequencies=10):
        super(SFM, self).__init__()

        self.hidden_size = hidden_size
        self.num_frequencies = num_frequencies

        # 状态记忆模块
        self.state_memory = nn.Linear(input_size, hidden_size)

        # 频率记忆模块
        self.frequency_memory = nn.Linear(input_size, num_frequencies)

        # 注意力机制
        self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)

        # 输出层
        self.output_layer = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        batch_size, seq_len, _ = x.size()

        # 状态记忆
        state_memory = self.state_memory(x)  # (batch_size, seq_len, hidden_size)

        # 频率记忆
        frequency_memory = self.frequency_memory(x)  # (batch_size, seq_len, num_frequencies)

        # 注意力机制
        state_memory = state_memory.transpose(0, 1)  # (seq_len, batch_size, hidden_size)
        attended_output, _ = self.attention(state_memory, state_memory, state_memory)
        attended_output = attended_output.transpose(0, 1)  # (batch_size, seq_len, hidden_size)

        # 结合频率信息
        frequency_weights = F.softmax(frequency_memory, dim=-1)
        weighted_output = attended_output * frequency_weights.unsqueeze(-1).expand_as(attended_output)

        # 输出层
        output = self.output_layer(weighted_output[:, -1, :])  # 取最后一个时间步
        return output

ALSTM模型实现

ALSTM(Attention LSTM)结合了LSTM和注意力机制。

class ALSTM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(ALSTM, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # LSTM层
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True
        )

        # 注意力层
        self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)

        # 输出层
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)

        # LSTM处理
        lstm_out, _ = self.lstm(x)  # (batch_size, seq_len, hidden_size)

        # 注意力机制
        lstm_out = lstm_out.transpose(0, 1)  # (seq_len, batch_size, hidden_size)
        attended_output, _ = self.attention(lstm_out, lstm_out, lstm_out)
        attended_output = attended_output.transpose(0, 1)  # (batch_size, seq_len, hidden_size)

        # 输出层
        last_output = attended_output[:, -1, :]  # 取最后一个时间步
        output = self.fc(last_output)
        return output

3.3.2 时间卷积网络(TCN)

TCN简介

时间卷积网络(TCN)使用因果卷积处理时间序列数据,具有并行计算和长距离依赖的优势。

主要特点:

  • 因果卷积:确保不会使用未来信息
  • 并行计算:支持并行训练
  • 长距离依赖:通过扩张卷积处理长序列
  • 残差连接:缓解梯度消失问题

模型实现

class TemporalBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, dilation, padding, dropout=0.2):
        super(TemporalBlock, self).__init__()

        self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size,
                               stride=stride, padding=padding, dilation=dilation)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout)

        self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size,
                               stride=stride, padding=padding, dilation=dilation)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout)

        self.net = nn.Sequential(self.conv1, self.relu1, self.dropout1,
                                self.conv2, self.relu2, self.dropout2)

        self.downsample = nn.Conv1d(in_channels, out_channels, 1) if in_channels != out_channels else None
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.net(x)
        res = x if self.downsample is None else self.downsample(x)
        return self.relu(out + res)

class TCN(nn.Module):
    def __init__(self, input_size, num_channels, kernel_size=2, dropout=0.2):
        super(TCN, self).__init__()

        layers = []
        num_levels = len(num_channels)

        for i in range(num_levels):
            dilation_size = 2 ** i
            in_channels = input_size if i == 0 else num_channels[i-1]
            out_channels = num_channels[i]

            layers.append(
                TemporalBlock(in_channels, out_channels, kernel_size, stride=1,
                            dilation=dilation_size, padding=(kernel_size-1) * dilation_size,
                            dropout=dropout)
            )

        self.network = nn.Sequential(*layers)
        self.fc = nn.Linear(num_channels[-1], 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        x = x.transpose(1, 2)  # (batch_size, input_size, seq_len)
        x = self.network(x)
        x = x.transpose(1, 2)  # (batch_size, seq_len, num_channels[-1])

        # 取最后一个时间步
        last_output = x[:, -1, :]
        output = self.fc(last_output)
        return output

# 创建TCN模型
model = LSTMModel(  # 复用LSTM的模型接口
    model=TCN(input_size=data['train']['feature'].shape[1], num_channels=[64, 128, 256]),
    optimizer='adam',
    loss='mse',
    lr=0.001,
    max_epochs=100,
    batch_size=256
)

3.3.3 自适应模型(ADARNN, ADD)

自适应模型简介

自适应模型能够根据数据分布的变化自动调整模型参数,在量化投资中特别有用。

主要特点:

  • 动态适应:能够适应市场变化
  • 在线学习:支持增量学习
  • 概念漂移处理:能够处理数据分布变化
  • 鲁棒性:对异常数据更鲁棒

ADARNN模型实现

ADARNN(Adaptive RNN)是一种能够自适应调整的循环神经网络。

class ADARNN(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super(ADARNN, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # 自适应LSTM层
        self.lstm_layers = nn.ModuleList([
            nn.LSTM(input_size if i == 0 else hidden_size, hidden_size, 1, dropout=dropout)
            for i in range(num_layers)
        ])

        # 自适应门控机制
        self.adaptive_gates = nn.ModuleList([
            nn.Linear(hidden_size, hidden_size)
            for _ in range(num_layers)
        ])

        # 输出层
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        # x shape: (batch_size, seq_len, input_size)
        batch_size, seq_len, _ = x.size()

        # 初始化隐藏状态
        h = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        c = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)

        outputs = []

        for t in range(seq_len):
            layer_input = x[:, t, :].unsqueeze(1)  # (batch_size, 1, input_size)

            for layer_idx in range(self.num_layers):
                # LSTM处理
                lstm_out, (h[layer_idx], c[layer_idx]) = self.lstm_layers[layer_idx](
                    layer_input, (h[layer_idx], c[layer_idx])
                )

                # 自适应门控
                gate = torch.sigmoid(self.adaptive_gates[layer_idx](lstm_out))
                lstm_out = lstm_out * gate

                layer_input = lstm_out

            outputs.append(lstm_out.squeeze(1))

        # 堆叠所有时间步的输出
        outputs = torch.stack(outputs, dim=1)  # (batch_size, seq_len, hidden_size)

        # 输出层
        last_output = outputs[:, -1, :]  # 取最后一个时间步
        output = self.fc(last_output)
        return output

ADD模型实现

ADD(Adaptive Deep Network)是一种自适应深度网络。

class ADD(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=3, dropout=0.2):
        super(ADD, self).__init__()

        layers = []
        layers.append(nn.Linear(input_size, hidden_size))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(dropout))

        for _ in range(num_layers - 2):
            layers.append(nn.Linear(hidden_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout))

        layers.append(nn.Linear(hidden_size, 1))

        self.network = nn.Sequential(*layers)

        # 自适应参数
        self.adaptive_weight = nn.Parameter(torch.ones(1))
        self.adaptive_bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        # 基础网络输出
        base_output = self.network(x)

        # 自适应调整
        adaptive_output = base_output * self.adaptive_weight + self.adaptive_bias

        return adaptive_output

3.3.4 多任务学习模型

多任务学习简介

多任务学习同时学习多个相关任务,在量化投资中可以同时预测多个目标。

主要特点:

  • 知识共享:不同任务间共享知识
  • 正则化效果:减少过拟合
  • 效率提升:同时学习多个任务
  • 泛化能力:提高模型泛化能力

模型实现

class MultiTaskModel(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_tasks=3):
        super(MultiTaskModel, self).__init__()

        # 共享特征提取层
        self.shared_layers = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.2)
        )

        # 任务特定层
        self.task_specific_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(hidden_size, hidden_size // 2),
                nn.ReLU(),
                nn.Dropout(0.2),
                nn.Linear(hidden_size // 2, 1)
            )
            for _ in range(num_tasks)
        ])

    def forward(self, x):
        # 共享特征提取
        shared_features = self.shared_layers(x)

        # 任务特定预测
        task_outputs = []
        for task_layer in self.task_specific_layers:
            task_output = task_layer(shared_features)
            task_outputs.append(task_output)

        return task_outputs

# 创建多任务模型
model = MultiTaskModel(
    input_size=data['train']['feature'].shape[1],
    hidden_size=128,
    num_tasks=3  # 预测3个不同的目标
)

# 多任务损失函数
def multi_task_loss(predictions, targets, weights=None):
    """多任务损失函数"""
    if weights is None:
        weights = [1.0] * len(predictions)

    total_loss = 0
    for pred, target, weight in zip(predictions, targets, weights):
        loss = F.mse_loss(pred, target)
        total_loss += weight * loss

    return total_loss

本章小结

本章详细介绍了Qlib中各种监督学习模型,包括:

  1. 传统机器学习模型:LightGBM、XGBoost、CatBoost等梯度提升模型
  2. 深度学习模型:MLP、LSTM/GRU、Transformer、GATs等神经网络模型
  3. 高级模型技术:注意力机制、时间卷积网络、自适应模型、多任务学习

课后练习

练习1:模型比较

  1. 使用不同模型训练Alpha158数据集
  2. 比较各模型的性能指标
  3. 分析模型的特点和适用场景

练习2:模型调优

  1. 对LightGBM模型进行超参数优化
  2. 使用网格搜索和贝叶斯优化
  3. 分析参数对模型性能的影响

练习3:特征工程

  1. 为深度学习模型设计特征
  2. 实现自定义的特征工程方法
  3. 分析特征对模型性能的影响

扩展阅读

  1. 机器学习理论

    • 《统计学习方法》
    • 《机器学习》
  2. 深度学习相关

    • 《深度学习》
    • 《动手学深度学习》
  3. 量化投资模型

    • 《量化投资策略与技术》
    • 《机器学习在量化投资中的应用》


This content originally appeared on DEV Community and was authored by Henry Lin