Transformer
is a neural network architecture similar to Feed-Forward Neural Network
(FFNN
), Convolutional Neural Network
(CNN
), Recurrent Neural Network
(RNN
), etc. At this time, Transformer
is the preferred architecture for building AI systems. Here, we explain why it performs so well, its limitations, and factors to consider when choosing a Transformer
as the neural architecture. However, if you are interested in the underlying math, the seminal paper Attention is all you need is a great starting point.
Why does Transformer Perform so Well?
Transformers
can see all past and future all at once! The alternative architectures have limited visibility (e.g.,FFNN
,CNN
) or sequential visibility (e.g.,RNN
).Transformers
are attentive!Transformers
use theAttention Mechanism
, which enables them to focus on what matters at any given instance.Transformer
computation is extremely hardware-friendly. The underlying math is such that it can fully take advantage of the parallel processing capabilities of GPUs and modern CPUs. Therefore it is faster to train and infer using Transformer.
What are the Drawbacks of Transformers?
Transformer
sees past and future all at once. But its runtime complexity grows quadratic as a function of the input length. For example, if processing a 1-second file with aTransformer
model takes 1 second, processing a 10-second file takes 100 seconds.Transformer
doesn’t have a concept of order in time but can see all the past and future! There are workarounds.Transformer
architecture is not suitable for streaming real-time applications.
Transformer for a New Project
One should consider Transformer
as a candidate architecture for a new project. Transformers
have reached (passed) state-of-the-art in NLP, computer vision, and speech applications. Additionally, extensive software support exists for implementing, training, and deploying Transformers
.
Transformer for an Existing Product
It depends on how well the baseline (existing model) is performing. Remember that beating a well-trained and tuned model can be a massive undertaking. Additionally, some of the requirements for the product (e.g., latency, memory usage, etc.) can be a showstopper for bringing Transformer
onboard.