Machine Heart Report
Keras and PyTorch are both among the most beginner-friendly deep learning frameworks. They operate like a simple language for describing architectures, telling the framework what layer should use what. Many researchers and developers are pondering which framework is better, but currently, both frameworks are very popular, each with its own advantages. Recently, Facebook researcher William Falcon has donned Keras on PyTorch, indicating that doing research with such a framework is simply delightful.
PyTorch Lightning address: https://github.com/williamFalcon/pytorch-lightning
PyTorch that Looks Like Keras
The purpose of Keras itself is to further encapsulate the APIs of deep learning frameworks (TensorFlow, Theano). As a highly encapsulated version of TensorFlow, Keras has a very high level of abstraction, hiding many API details. Although PyTorch also uses dynamic computation graphs and is convenient and fast, overall, Keras hides more details.
In contrast, PyTorch provides a relatively lower-level experimental environment, allowing users to write custom layers, view numerical optimization tasks, and more freely. For example, in PyTorch 1.0, the compilation tool torch.jit includes a language called Torch Script, which is a sublanguage of Python that developers can use to further optimize models.
When writing models in PyTorch, aside from the data loading and model definition parts, the entire training and validation logic, configuration, all need to be done manually, which can be quite cumbersome. It can even be said that researchers need to spend considerable effort handling this part of the code while praying for no bugs. However, for most research experiments, the training and validation loops are nearly identical, and the functions achieved are quite consistent. So why not package these common elements together to simplify training?
William Falcon thought exactly this way, wrapping all the common configurations in PyTorch development so that we only need to write the core logic. With PyTorch Lightning, PyTorch becomes similar to Keras, allowing for rapid model building in a more advanced form.
Who is the Project Author?
Completing such work requires a significant amount of effort, as everything from hyperparameter search, model debugging, distributed training, to training and validation loop logic and model logging must be written into a general solution to ensure various tasks can be used. Therefore, William Falcon from Facebook is quite impressive.
He is a developer at NYU and Facebook, currently pursuing a PhD at NYU. From his activity on GitHub, it appears that he is a fairly active developer.
This is PyTorch Dressed in Keras
Lightning is a very lightweight wrapper for PyTorch, allowing researchers to write only the core training and validation logic, while other processes are automated. Therefore, it is somewhat similar to the high-level wrappers like Keras, which hide most details and retain only the most straightforward interfaces. Lightning ensures the correctness of the automated parts and is very advantageous for refining core training logic.
So Why Should We Use Lightning?
When we start building new projects, what you ultimately hope to achieve may be recording the training loop, multi-cluster training, float16 precision, early stopping, model loading/saving, etc. This series of processes can require a lot of effort to resolve various, bizarre bugs, making it difficult to focus on the core logic of the research.
By using Lightning, these parts can be guaranteed to work, allowing us to focus on what we want to research: data, training, and validation logic. Additionally, we don’t have to worry about the difficulty of using multi-GPU acceleration, as Lightning takes care of all of this.
So What Can Lightning Help Us With?
The diagram below shows the various processes involved in building a machine learning model. Often, the most difficult part is not writing the model itself, but the various configuration and preprocessing processes. The blue parts need to be defined using LightningModule, while the gray parts can be completed automatically by Lightning. What we need to do is mainly load the data, define the model, and determine the training and validation processes.
The following pseudocode demonstrates the major modules that need to be defined. Together with the model architecture definition, they can form a complete model.
# what to do in the training loop
def training_step(self, data_batch, batch_nb):
# what to do in the validation loop
def validation_step(self, data_batch, batch_nb):
# how to aggregate validation_step outputs
def validation_end(self, outputs):
# and your dataloaders
def tng_dataloader():
def val_dataloader():
def test_dataloader():
Aside from the modules that need to be defined, the following steps can all be completed automatically by Lightning. Of course, each module can be configured individually.
How to Use Lightning
Using Lightning is also very simple; it can be completed in just two steps: define LightningModel; fit the trainer.
Taking the classic MNIST image recognition as an example, below is an example of LightningModel. We can import the PyTorch modules as usual, but this time we inherit from LightningModel instead of nn.Module. Then we just need to write PyTorch as usual, and the function calls remain the same. While this may not seem different, note that the method names are fixed, allowing us to utilize Lightning’s subsequent processes.
import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
import pytorch_lightning as ptl
class CoolModel(ptl.LightningModule):
def __init__(self):
super(CoolModel, self).__init__()
# not the best model...
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def my_loss(self, y_hat, y):
return F.cross_entropy(y_hat, y)
def training_step(self, batch, batch_nb):
x, y = batch
y_hat = self.forward(x)
return {'loss': self.my_loss(y_hat, y)}
def validation_step(self, batch, batch_nb):
x, y = batch
y_hat = self.forward(x)
return {'val_loss': self.my_loss(y_hat, y)}
def validation_end(self, outputs):
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
return {'avg_val_loss': avg_loss}
def configure_optimizers(self):
return [torch.optim.Adam(self.parameters(), lr=0.02)]
@ptl.data_loader
def tng_dataloader(self):
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
@ptl.data_loader
def val_dataloader(self):
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
@ptl.data_loader
def test_dataloader(self):
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
Then, the second step is fitting the trainer. This is quite similar to high-level wrappers like Keras, which hide the details of training configurations, loops, and logging outputs, allowing everything to be automatically managed with a single fit() method.
This makes it more convenient and concise compared to writing PyTorch previously, and distributed training is also very easy; you just need to specify the device ID.
from pytorch_lightning import Trainer
from test_tube import Experiment
model = CoolModel()
exp = Experiment(save_dir=os.getcwd())
# train on cpu using only 10% of the data (for demo purposes)
trainer = Trainer(experiment=exp, max_nb_epochs=1, train_percent_check=0.1)
# train on 4 gpus
# trainer = Trainer(experiment=exp, max_nb_epochs=1, gpus=[0, 1, 2, 3])
# train on 32 gpus across 4 nodes (make sure to submit appropriate SLURM job)
# trainer = Trainer(experiment=exp, max_nb_epochs=1, gpus=[0, 1, 2, 3, 4, 5, 6, 7], nb_gpu_nodes=4)
# train (1 epoch only here for demo)
trainer.fit(model)
# view tensorflow logs
print(f'View tensorboard logs by running\ntensorboard --logdir {os.getcwd()}')
print('and going to http://localhost:6006 on your browser')
Other Features
Pytorch-Lightning can also seamlessly integrate with TensorBoard.
Just define the running path:
from test_tube import Experiment
from pytorch-lightning import Trainer
exp = Experiment(save_dir = '/some/path')
trainer = Trainer(experiment = exp)
Then connect TensorBoard to the path:
tensorboard -logdir /some/path
On the evening of August 13, Tencent will hold the Tencent Academic Industrial Exchange Conference (TAIC) during the IJCAI 2019 conference in Macau, inviting AI practitioners to participate and discuss the applications and future development of AI.Click “Read the original text” for details and to register.