TorchArc
2020/06/20
Building modular PyTorch models for my projects in the past years has prompted me to use a config-based approach to define model architecture. Over time I have iteratively refined the method, and recently I felt it has become sufficiently mature to be open sourced.
The project is known as TorchArc: Build PyTorch networks by specifying architectures. You can install it from pip:
pip install torcharc
My experience from building quite a lot of models for DL and RL (which can have unconventional architecture) has resulted in the following observations:
most models are built with common components and hyperparameters - layers, width, activation, norm, dropout, init, etc. These can be specified via a config with the structure of a JSON/YAML.
using config-based architecture frees us from frequent hard-code changes, while also immediately allow for hyperparameter optimization on the entire architecture. Yes, you can do NAS (neural architecture search) quite easily.
sometimes we wish to compose models together, e.g. a hybrid network with a conv net and MLP for bi-modal inputs, joined together in the middle with another MLP, and split to multiple outputs for multi-modal controls.
The composed networks is always a DAG. This means it can be specified via a JSON/YAML structure too (this can be proven mathematically, but it's not why we're here).
These essentially formed the design requirement for TorchArc. Additionally, I've also added a carry_forward
method which accepts a TensorTuple input, forward-pass any tensors in it by name-matching, and carry any unused tensors in the output. This allows a multi-modal input to be carried and forward-passed all the way until the output.
Let's jump straight into TorchArc.
Example Usage
Given just the architecture, torcharc
can build generic DAG (directed acyclic graph) of nn modules, which consists of:
single-input-output modules:
Conv1d, Conv2d, Conv3d, Linear, PTTSTransformer, TSTransformer
or any other valid nn.Modulefork modules:
ReuseFork, SplitFork
merge modules:
ConcatMerge, FiLMMerge
The custom modules are defined in torcharc/module
, registered in torcharc/module_builder.py
.
The full examples of architecture references are in torcharc/arc_ref.py
, and full functional examples are in test/module/
. Below we walk through some main examples.
ConvNet
import torcharc
arc = {
'type': 'Conv2d',
'in_shape': [3, 20, 20],
'layers': [
[16, 4, 2, 0, 1],
[16, 4, 1, 0, 1]
],
'batch_norm': True,
'activation': 'ReLU',
'dropout': 0.2,
'init': 'kaiming_uniform_',
}
model = torcharc.build(arc)
batch_size = 16
x = torch.rand([batch_size, *arc['in_shape']])
y = model(x)
MLP
arc = {
'type': 'Linear',
'in_features': 8,
'layers': [64, 32],
'batch_norm': True,
'activation': 'ReLU',
'dropout': 0.2,
'init': {
'type': 'normal_',
'std': 0.01,
},
}
model = torcharc.build(arc)
batch_size = 16
x = torch.rand([batch_size, arc['in_features']])
y = model(x)
Time-Series Transformer
arc = {
'type': 'TSTransformer',
'd_model': 64,
'nhead': 8,
'num_encoder_layers': 4,
'num_decoder_layers': 4,
'dropout': 0.2,
'dim_feedforward': 2048,
'activation': 'relu',
'in_embedding': 'Linear',
'pe': 'sinusoid',
'attention_size': None,
'in_channels': 1,
'out_channels': 1,
'q': 8,
'v': 8,
'chunk_mode': None,
}
model = torcharc.build(arc)
seq_len = 32
x = torch.rand([seq_len, arc['in_channels']])
DAG: Hydra
Ultimately, we can build a generic DAG network using the modules linked by the fork and merge modules. The example below shows HydraNet - a network with multiple inputs and multiple outputs.
arc = {
'dag_in_shape': {'image': [3, 20, 20], 'vector': [8]},
'image': {
'type': 'Conv2d',
'in_names': ['image'],
'layers': [
[16, 4, 2, 0, 1],
[16, 4, 1, 0, 1]
],
'batch_norm': True,
'activation': 'ReLU',
'dropout': 0.2,
'init': 'kaiming_uniform_',
},
'merge': {
'type': 'FiLMMerge',
'in_names': ['image', 'vector'],
'names': {'feature': 'image', 'conditioner': 'vector'},
},
'Flatten': {
'type': 'Flatten'
},
'Linear': {
'type': 'Linear',
'layers': [64, 32],
'batch_norm': True,
'activation': 'ReLU',
'dropout': 0.2,
'init': 'kaiming_uniform_',
},
'out': {
'type': 'Linear',
'out_features': 8,
},
'fork': {
'type': 'SplitFork',
'shapes': {'mean': [4], 'std': [4]},
}
}
model = torcharc.build(arc)
batch_size = 16
dag_in_shape = arc['dag_in_shape']
xs = {'image': torch.rand([batch_size, *dag_in_shape['image']]), 'vector': torch.rand([batch_size, *dag_in_shape['vector']])}
# returns dict of Tensors if output is multi-modal, Tensor otherwise
ys = model(xs)
The DAG module accepts a dict
(example below) as input, and the module selects its input by matching its own name in the arc and the in_name
, then carry forward the output together with any unconsumed inputs.
For example, the input xs
with keys image, vector
passes through the first image
module, and the output becomes {'image': image_module(xs.image), 'vector': xs.vector}
. This is then passed through the remainder of the modules in the arc as declared.
Optimizer
TorchArc also provides convenience method to construct optimizer in the same config-driven manner.
import torcharc
arc = {
'type': 'Linear',
'in_features': 8,
'layers': [64, 32],
'batch_norm': True,
'activation': 'ReLU',
'dropout': 0.2,
'init': {
'type': 'normal_',
'std': 0.01,
},
}
optim_spec = {
'type': 'Adam',
'lr': 0.001,
}
model = torcharc.build(arc)
optimizer = torcharc.build_optimizer(optim_spec, model)
Last updated
Was this helpful?