This content originally appeared on DEV Community and was authored by Henry Lin
第3章:监督学习模型
学习目标
通过本章学习,您将能够:
- 理解量化投资中监督学习的基本原理
- 掌握传统机器学习模型在量化投资中的应用
- 熟悉深度学习模型的特点和使用方法
- 学会使用Qlib中的各种预训练模型
- 理解高级模型技术的原理和应用
3.1 传统机器学习模型
3.1.1 LightGBM模型原理与实践
LightGBM简介
LightGBM(Light Gradient Boosting Machine)是微软开发的一个高效的梯度提升框架,在量化投资中表现优异。
主要特点:
- 高效性:基于直方图算法,训练速度快
- 内存友好:内存占用低,支持大数据集
- 准确性:在多个基准测试中表现优秀
- 可解释性:提供特征重要性分析
模型原理
1. 梯度提升决策树(GBDT)
# GBDT基本原理
class GBDT:
def __init__(self, n_estimators=100, learning_rate=0.1):
self.n_estimators = n_estimators
self.learning_rate = learning_rate
self.trees = []
def fit(self, X, y):
# 初始化预测值
predictions = np.zeros(len(y))
for i in range(self.n_estimators):
# 计算残差
residuals = y - predictions
# 训练决策树
tree = DecisionTreeRegressor(max_depth=6)
tree.fit(X, residuals)
# 更新预测值
predictions += self.learning_rate * tree.predict(X)
self.trees.append(tree)
def predict(self, X):
predictions = np.zeros(len(X))
for tree in self.trees:
predictions += self.learning_rate * tree.predict(X)
return predictions
2. LightGBM优化技术
- 直方图算法:将连续特征离散化,减少内存使用
- 叶子优先策略:优先分裂叶子节点,减少过拟合
- 类别特征优化:直接支持类别特征,无需独热编码
Qlib中的LightGBM实现
1. 基础使用
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158
# 准备数据
handler = Alpha158(
instruments='csi300',
start_time='2020-01-01',
end_time='2020-12-31',
freq='day'
)
# 获取数据
data = handler.fetch(
segments={
'train': ('2020-01-01', '2020-06-30'),
'valid': ('2020-07-01', '2020-09-30'),
'test': ('2020-10-01', '2020-12-31')
}
)
# 创建模型
model = LGBModel(
loss='mse',
colsample_bytree=0.8879,
learning_rate=0.2,
subsample=0.8789,
n_estimators=100,
max_depth=8,
num_leaves=210,
min_child_samples=20
)
# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])
2. 模型配置优化
# 优化配置
model_config = {
'loss': 'mse', # 损失函数
'colsample_bytree': 0.8879, # 特征采样比例
'learning_rate': 0.2, # 学习率
'subsample': 0.8789, # 样本采样比例
'n_estimators': 100, # 树的数量
'max_depth': 8, # 最大深度
'num_leaves': 210, # 叶子节点数
'min_child_samples': 20, # 最小样本数
'reg_alpha': 0.001, # L1正则化
'reg_lambda': 0.001, # L2正则化
'random_state': 42 # 随机种子
}
model = LGBModel(**model_config)
3. 特征重要性分析
# 获取特征重要性
feature_importance = model.feature_importance()
# 可视化特征重要性
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
feature_importance.sort_values(ascending=True).plot(kind='barh')
plt.title('Feature Importance')
plt.xlabel('Importance')
plt.tight_layout()
plt.show()
3.1.2 XGBoost模型应用
XGBoost简介
XGBoost(eXtreme Gradient Boosting)是一个优化的分布式梯度提升库,在机器学习竞赛中表现优异。
主要特点:
- 正则化:内置L1和L2正则化,防止过拟合
- 并行计算:支持多线程并行训练
- 处理缺失值:自动处理缺失值
- 交叉验证:内置交叉验证功能
模型实现
1. 基础实现
from qlib.contrib.model.gbdt import XGBModel
# 创建XGBoost模型
model = XGBModel(
max_depth=6,
learning_rate=0.1,
n_estimators=100,
objective='reg:squarederror',
subsample=0.8,
colsample_bytree=0.8,
random_state=42
)
# 训练模型
model.fit(
data['train']['feature'],
data['train']['label']['LABEL0'],
eval_set=[(data['valid']['feature'], data['valid']['label']['LABEL0'])],
early_stopping_rounds=10,
verbose=False
)
2. 超参数优化
from sklearn.model_selection import GridSearchCV
# 定义参数网格
param_grid = {
'max_depth': [3, 6, 9],
'learning_rate': [0.01, 0.1, 0.2],
'n_estimators': [50, 100, 200],
'subsample': [0.8, 0.9, 1.0],
'colsample_bytree': [0.8, 0.9, 1.0]
}
# 网格搜索
grid_search = GridSearchCV(
XGBModel(),
param_grid,
cv=5,
scoring='neg_mean_squared_error',
n_jobs=-1
)
grid_search.fit(data['train']['feature'], data['train']['label']['LABEL0'])
print("最佳参数:", grid_search.best_params_)
3.1.3 CatBoost模型使用
CatBoost简介
CatBoost是Yandex开发的梯度提升库,特别适合处理类别特征。
主要特点:
- 类别特征处理:原生支持类别特征
- 过拟合控制:内置过拟合检测
- 预测质量:在多个基准测试中表现优秀
- 易用性:参数相对较少,易于调优
模型实现
1. 基础使用
from qlib.contrib.model.gbdt import CatBoostModel
# 创建CatBoost模型
model = CatBoostModel(
iterations=100,
learning_rate=0.1,
depth=6,
l2_leaf_reg=3,
loss_function='RMSE',
random_seed=42
)
# 训练模型
model.fit(
data['train']['feature'],
data['train']['label']['LABEL0'],
eval_set=(data['valid']['feature'], data['valid']['label']['LABEL0']),
verbose=False
)
2. 类别特征处理
# 识别类别特征
categorical_features = data['train']['feature'].select_dtypes(include=['object']).columns
# 创建模型时指定类别特征
model = CatBoostModel(
iterations=100,
learning_rate=0.1,
depth=6,
cat_features=categorical_features.tolist()
)
3.1.4 线性模型和集成方法
线性回归模型
1. 基础线性回归
from sklearn.linear_model import LinearRegression
from qlib.contrib.model.linear import LinearModel
# 创建线性模型
model = LinearModel(
fit_intercept=True,
normalize=False
)
# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])
# 预测
predictions = model.predict(data['test']['feature'])
2. 正则化线性模型
from sklearn.linear_model import Ridge, Lasso, ElasticNet
# Ridge回归(L2正则化)
ridge_model = Ridge(alpha=1.0)
# Lasso回归(L1正则化)
lasso_model = Lasso(alpha=0.1)
# ElasticNet(L1+L2正则化)
elastic_model = ElasticNet(alpha=0.1, l1_ratio=0.5)
集成方法
1. 随机森林
from sklearn.ensemble import RandomForestRegressor
# 创建随机森林模型
rf_model = RandomForestRegressor(
n_estimators=100,
max_depth=10,
min_samples_split=5,
min_samples_leaf=2,
random_state=42
)
# 训练模型
rf_model.fit(data['train']['feature'], data['train']['label']['LABEL0'])
2. 投票回归器
from sklearn.ensemble import VotingRegressor
# 创建多个基础模型
models = [
('lgb', LGBModel()),
('xgb', XGBModel()),
('cat', CatBoostModel())
]
# 创建投票回归器
voting_regressor = VotingRegressor(estimators=models)
# 训练模型
voting_regressor.fit(data['train']['feature'], data['train']['label']['LABEL0'])
3.2 深度学习模型
3.2.1 MLP神经网络模型
MLP简介
多层感知机(MLP)是最基础的深度学习模型,在量化投资中也有广泛应用。
主要特点:
- 非线性建模:通过激活函数引入非线性
- 特征学习:自动学习特征表示
- 灵活性:可以调整网络结构
- 可解释性:相对容易理解
模型实现
1. 基础MLP模型
import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import MLPModel
# 定义MLP网络结构
class MLP(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=3):
super(MLP, self).__init__()
layers = []
layers.append(nn.Linear(input_size, hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(0.2))
for _ in range(num_layers - 2):
layers.append(nn.Linear(hidden_size, hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(0.2))
layers.append(nn.Linear(hidden_size, 1))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
# 创建模型
model = MLPModel(
model=MLP(input_size=data['train']['feature'].shape[1]),
optimizer='adam',
loss='mse',
lr=0.001,
max_epochs=100,
batch_size=256
)
# 训练模型
model.fit(data['train']['feature'], data['train']['label']['LABEL0'])
2. 高级MLP配置
# 高级配置
model_config = {
'model': MLP(input_size=data['train']['feature'].shape[1], hidden_size=256),
'optimizer': 'adam',
'loss': 'mse',
'lr': 0.001,
'max_epochs': 200,
'batch_size': 512,
'early_stop': 20,
'lr_scheduler': 'step',
'lr_step_size': 50,
'lr_gamma': 0.5
}
model = MLPModel(**model_config)
3.2.2 LSTM/GRU时序模型
时序模型简介
LSTM(Long Short-Term Memory)和GRU(Gated Recurrent Unit)是专门处理时序数据的循环神经网络。
主要特点:
- 记忆机制:能够记住长期依赖关系
- 门控机制:控制信息流动
- 序列建模:适合处理时间序列数据
- 梯度消失控制:有效解决梯度消失问题
LSTM模型实现
1. 基础LSTM模型
import torch
import torch.nn as nn
from qlib.contrib.model.pytorch_nn import LSTMModel
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
super(LSTM, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
dropout=dropout,
batch_first=True
)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
lstm_out, _ = self.lstm(x)
# 取最后一个时间步的输出
last_output = lstm_out[:, -1, :]
# 全连接层
output = self.fc(last_output)
return output
# 创建模型
model = LSTMModel(
model=LSTM(input_size=data['train']['feature'].shape[1]),
optimizer='adam',
loss='mse',
lr=0.001,
max_epochs=100,
batch_size=256
)
2. GRU模型实现
class GRU(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
super(GRU, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
dropout=dropout,
batch_first=True
)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
gru_out, _ = self.gru(x)
# 取最后一个时间步的输出
last_output = gru_out[:, -1, :]
# 全连接层
output = self.fc(last_output)
return output
# 创建GRU模型
model = LSTMModel( # 可以复用LSTM的模型接口
model=GRU(input_size=data['train']['feature'].shape[1]),
optimizer='adam',
loss='mse',
lr=0.001,
max_epochs=100,
batch_size=256
)
3.2.3 Transformer模型应用
Transformer简介
Transformer是基于注意力机制的深度学习模型,在自然语言处理领域取得巨大成功,近年来在量化投资中也得到应用。
主要特点:
- 注意力机制:能够关注重要的时间步
- 并行计算:支持并行训练
- 长距离依赖:能够处理长序列依赖
- 可扩展性:易于扩展和优化
模型实现
1. 基础Transformer模型
import torch
import torch.nn as nn
import math
class PositionalEncoding(nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)
self.register_buffer('pe', pe)
def forward(self, x):
return x + self.pe[:x.size(0), :]
class Transformer(nn.Module):
def __init__(self, input_size, d_model=128, nhead=8, num_layers=6, dropout=0.1):
super(Transformer, self).__init__()
self.input_projection = nn.Linear(input_size, d_model)
self.pos_encoder = PositionalEncoding(d_model)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=nhead,
dim_feedforward=d_model * 4,
dropout=dropout
)
self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers)
self.fc = nn.Linear(d_model, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
x = self.input_projection(x) # (batch_size, seq_len, d_model)
x = x.transpose(0, 1) # (seq_len, batch_size, d_model)
x = self.pos_encoder(x)
x = self.transformer_encoder(x)
x = x.transpose(0, 1) # (batch_size, seq_len, d_model)
# 取最后一个时间步的输出
last_output = x[:, -1, :]
output = self.fc(last_output)
return output
# 创建Transformer模型
model = LSTMModel( # 复用LSTM的模型接口
model=Transformer(input_size=data['train']['feature'].shape[1]),
optimizer='adam',
loss='mse',
lr=0.0001,
max_epochs=100,
batch_size=128
)
3.2.4 图神经网络(GATs)模型
图神经网络简介
图注意力网络(GATs)是图神经网络的一种,能够处理股票之间的复杂关系。
主要特点:
- 图结构建模:能够建模股票间的关系
- 注意力机制:动态学习股票间的重要性
- 关系建模:考虑股票间的相互影响
- 可解释性:能够解释股票间的关系
模型实现
1. 图注意力层
import torch
import torch.nn as nn
import torch.nn.functional as F
class GraphAttentionLayer(nn.Module):
def __init__(self, in_features, out_features, dropout=0.1, alpha=0.2):
super(GraphAttentionLayer, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.dropout = dropout
self.alpha = alpha
self.W = nn.Linear(in_features, out_features, bias=False)
self.a = nn.Linear(2 * out_features, 1, bias=False)
self.leakyrelu = nn.LeakyReLU(self.alpha)
def forward(self, input, adj):
# input: (N, in_features)
# adj: (N, N)
Wh = self.W(input) # (N, out_features)
# 计算注意力系数
a_input = torch.cat([Wh.repeat_interleave(Wh.size(0), dim=0),
Wh.repeat(Wh.size(0), 1)], dim=1)
a_input = a_input.view(Wh.size(0), Wh.size(0), 2 * self.out_features)
e = self.leakyrelu(self.a(a_input).squeeze(2))
# 掩码处理
zero_vec = -9e15 * torch.ones_like(e)
attention = torch.where(adj > 0, e, zero_vec)
attention = F.softmax(attention, dim=1)
attention = F.dropout(attention, self.dropout, training=self.training)
h_prime = torch.matmul(attention, Wh)
return h_prime
class GAT(nn.Module):
def __init__(self, input_size, hidden_size=64, num_heads=8, dropout=0.1):
super(GAT, self).__init__()
self.dropout = dropout
self.attention_layers = nn.ModuleList([
GraphAttentionLayer(input_size, hidden_size, dropout, alpha=0.2)
for _ in range(num_heads)
])
self.out_att = GraphAttentionLayer(hidden_size * num_heads, 1, dropout, alpha=0.2)
def forward(self, x, adj):
# x: (N, input_size)
# adj: (N, N)
# 多头注意力
x = F.dropout(x, self.dropout, training=self.training)
x = torch.cat([att(x, adj) for att in self.attention_layers], dim=1)
x = F.elu(x)
# 输出层
x = F.dropout(x, self.dropout, training=self.training)
x = self.out_att(x, adj)
return x
# 创建GAT模型
model = LSTMModel( # 复用LSTM的模型接口
model=GAT(input_size=data['train']['feature'].shape[1]),
optimizer='adam',
loss='mse',
lr=0.001,
max_epochs=100,
batch_size=256
)
3.3 高级模型技术
3.3.1 注意力机制模型(SFM, ALSTM)
注意力机制简介
注意力机制允许模型关注输入序列中的重要部分,在量化投资中能够识别关键的时间点和特征。
主要特点:
- 动态权重:根据输入动态分配权重
- 可解释性:能够解释模型的关注点
- 长距离依赖:能够处理长序列依赖
- 灵活性:可以应用于多种模型
SFM模型实现
SFM(State Frequency Memory)是一种专门用于时间序列预测的注意力模型。
class SFM(nn.Module):
def __init__(self, input_size, hidden_size=128, num_frequencies=10):
super(SFM, self).__init__()
self.hidden_size = hidden_size
self.num_frequencies = num_frequencies
# 状态记忆模块
self.state_memory = nn.Linear(input_size, hidden_size)
# 频率记忆模块
self.frequency_memory = nn.Linear(input_size, num_frequencies)
# 注意力机制
self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)
# 输出层
self.output_layer = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
batch_size, seq_len, _ = x.size()
# 状态记忆
state_memory = self.state_memory(x) # (batch_size, seq_len, hidden_size)
# 频率记忆
frequency_memory = self.frequency_memory(x) # (batch_size, seq_len, num_frequencies)
# 注意力机制
state_memory = state_memory.transpose(0, 1) # (seq_len, batch_size, hidden_size)
attended_output, _ = self.attention(state_memory, state_memory, state_memory)
attended_output = attended_output.transpose(0, 1) # (batch_size, seq_len, hidden_size)
# 结合频率信息
frequency_weights = F.softmax(frequency_memory, dim=-1)
weighted_output = attended_output * frequency_weights.unsqueeze(-1).expand_as(attended_output)
# 输出层
output = self.output_layer(weighted_output[:, -1, :]) # 取最后一个时间步
return output
ALSTM模型实现
ALSTM(Attention LSTM)结合了LSTM和注意力机制。
class ALSTM(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
super(ALSTM, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# LSTM层
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
dropout=dropout,
batch_first=True
)
# 注意力层
self.attention = nn.MultiheadAttention(hidden_size, num_heads=8)
# 输出层
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
# LSTM处理
lstm_out, _ = self.lstm(x) # (batch_size, seq_len, hidden_size)
# 注意力机制
lstm_out = lstm_out.transpose(0, 1) # (seq_len, batch_size, hidden_size)
attended_output, _ = self.attention(lstm_out, lstm_out, lstm_out)
attended_output = attended_output.transpose(0, 1) # (batch_size, seq_len, hidden_size)
# 输出层
last_output = attended_output[:, -1, :] # 取最后一个时间步
output = self.fc(last_output)
return output
3.3.2 时间卷积网络(TCN)
TCN简介
时间卷积网络(TCN)使用因果卷积处理时间序列数据,具有并行计算和长距离依赖的优势。
主要特点:
- 因果卷积:确保不会使用未来信息
- 并行计算:支持并行训练
- 长距离依赖:通过扩张卷积处理长序列
- 残差连接:缓解梯度消失问题
模型实现
class TemporalBlock(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride, dilation, padding, dropout=0.2):
super(TemporalBlock, self).__init__()
self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size,
stride=stride, padding=padding, dilation=dilation)
self.relu1 = nn.ReLU()
self.dropout1 = nn.Dropout(dropout)
self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size,
stride=stride, padding=padding, dilation=dilation)
self.relu2 = nn.ReLU()
self.dropout2 = nn.Dropout(dropout)
self.net = nn.Sequential(self.conv1, self.relu1, self.dropout1,
self.conv2, self.relu2, self.dropout2)
self.downsample = nn.Conv1d(in_channels, out_channels, 1) if in_channels != out_channels else None
self.relu = nn.ReLU()
def forward(self, x):
out = self.net(x)
res = x if self.downsample is None else self.downsample(x)
return self.relu(out + res)
class TCN(nn.Module):
def __init__(self, input_size, num_channels, kernel_size=2, dropout=0.2):
super(TCN, self).__init__()
layers = []
num_levels = len(num_channels)
for i in range(num_levels):
dilation_size = 2 ** i
in_channels = input_size if i == 0 else num_channels[i-1]
out_channels = num_channels[i]
layers.append(
TemporalBlock(in_channels, out_channels, kernel_size, stride=1,
dilation=dilation_size, padding=(kernel_size-1) * dilation_size,
dropout=dropout)
)
self.network = nn.Sequential(*layers)
self.fc = nn.Linear(num_channels[-1], 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
x = x.transpose(1, 2) # (batch_size, input_size, seq_len)
x = self.network(x)
x = x.transpose(1, 2) # (batch_size, seq_len, num_channels[-1])
# 取最后一个时间步
last_output = x[:, -1, :]
output = self.fc(last_output)
return output
# 创建TCN模型
model = LSTMModel( # 复用LSTM的模型接口
model=TCN(input_size=data['train']['feature'].shape[1], num_channels=[64, 128, 256]),
optimizer='adam',
loss='mse',
lr=0.001,
max_epochs=100,
batch_size=256
)
3.3.3 自适应模型(ADARNN, ADD)
自适应模型简介
自适应模型能够根据数据分布的变化自动调整模型参数,在量化投资中特别有用。
主要特点:
- 动态适应:能够适应市场变化
- 在线学习:支持增量学习
- 概念漂移处理:能够处理数据分布变化
- 鲁棒性:对异常数据更鲁棒
ADARNN模型实现
ADARNN(Adaptive RNN)是一种能够自适应调整的循环神经网络。
class ADARNN(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
super(ADARNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# 自适应LSTM层
self.lstm_layers = nn.ModuleList([
nn.LSTM(input_size if i == 0 else hidden_size, hidden_size, 1, dropout=dropout)
for i in range(num_layers)
])
# 自适应门控机制
self.adaptive_gates = nn.ModuleList([
nn.Linear(hidden_size, hidden_size)
for _ in range(num_layers)
])
# 输出层
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
batch_size, seq_len, _ = x.size()
# 初始化隐藏状态
h = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
c = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
outputs = []
for t in range(seq_len):
layer_input = x[:, t, :].unsqueeze(1) # (batch_size, 1, input_size)
for layer_idx in range(self.num_layers):
# LSTM处理
lstm_out, (h[layer_idx], c[layer_idx]) = self.lstm_layers[layer_idx](
layer_input, (h[layer_idx], c[layer_idx])
)
# 自适应门控
gate = torch.sigmoid(self.adaptive_gates[layer_idx](lstm_out))
lstm_out = lstm_out * gate
layer_input = lstm_out
outputs.append(lstm_out.squeeze(1))
# 堆叠所有时间步的输出
outputs = torch.stack(outputs, dim=1) # (batch_size, seq_len, hidden_size)
# 输出层
last_output = outputs[:, -1, :] # 取最后一个时间步
output = self.fc(last_output)
return output
ADD模型实现
ADD(Adaptive Deep Network)是一种自适应深度网络。
class ADD(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=3, dropout=0.2):
super(ADD, self).__init__()
layers = []
layers.append(nn.Linear(input_size, hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout))
for _ in range(num_layers - 2):
layers.append(nn.Linear(hidden_size, hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout))
layers.append(nn.Linear(hidden_size, 1))
self.network = nn.Sequential(*layers)
# 自适应参数
self.adaptive_weight = nn.Parameter(torch.ones(1))
self.adaptive_bias = nn.Parameter(torch.zeros(1))
def forward(self, x):
# 基础网络输出
base_output = self.network(x)
# 自适应调整
adaptive_output = base_output * self.adaptive_weight + self.adaptive_bias
return adaptive_output
3.3.4 多任务学习模型
多任务学习简介
多任务学习同时学习多个相关任务,在量化投资中可以同时预测多个目标。
主要特点:
- 知识共享:不同任务间共享知识
- 正则化效果:减少过拟合
- 效率提升:同时学习多个任务
- 泛化能力:提高模型泛化能力
模型实现
class MultiTaskModel(nn.Module):
def __init__(self, input_size, hidden_size=128, num_tasks=3):
super(MultiTaskModel, self).__init__()
# 共享特征提取层
self.shared_layers = nn.Sequential(
nn.Linear(input_size, hidden_size),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Dropout(0.2)
)
# 任务特定层
self.task_specific_layers = nn.ModuleList([
nn.Sequential(
nn.Linear(hidden_size, hidden_size // 2),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_size // 2, 1)
)
for _ in range(num_tasks)
])
def forward(self, x):
# 共享特征提取
shared_features = self.shared_layers(x)
# 任务特定预测
task_outputs = []
for task_layer in self.task_specific_layers:
task_output = task_layer(shared_features)
task_outputs.append(task_output)
return task_outputs
# 创建多任务模型
model = MultiTaskModel(
input_size=data['train']['feature'].shape[1],
hidden_size=128,
num_tasks=3 # 预测3个不同的目标
)
# 多任务损失函数
def multi_task_loss(predictions, targets, weights=None):
"""多任务损失函数"""
if weights is None:
weights = [1.0] * len(predictions)
total_loss = 0
for pred, target, weight in zip(predictions, targets, weights):
loss = F.mse_loss(pred, target)
total_loss += weight * loss
return total_loss
本章小结
本章详细介绍了Qlib中各种监督学习模型,包括:
- 传统机器学习模型:LightGBM、XGBoost、CatBoost等梯度提升模型
- 深度学习模型:MLP、LSTM/GRU、Transformer、GATs等神经网络模型
- 高级模型技术:注意力机制、时间卷积网络、自适应模型、多任务学习
课后练习
练习1:模型比较
- 使用不同模型训练Alpha158数据集
- 比较各模型的性能指标
- 分析模型的特点和适用场景
练习2:模型调优
- 对LightGBM模型进行超参数优化
- 使用网格搜索和贝叶斯优化
- 分析参数对模型性能的影响
练习3:特征工程
- 为深度学习模型设计特征
- 实现自定义的特征工程方法
- 分析特征对模型性能的影响
扩展阅读
-
机器学习理论
- 《统计学习方法》
- 《机器学习》
-
深度学习相关
- 《深度学习》
- 《动手学深度学习》
-
量化投资模型
- 《量化投资策略与技术》
- 《机器学习在量化投资中的应用》
This content originally appeared on DEV Community and was authored by Henry Lin