This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)
*Memos:
- My post explains PReLU() and ELU().
- My post explains SELU() and CELU().
- My post explains Step function, Identity and ReLU.
- My post explains Leaky ReLU, PReLU and FReLU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Tanh, Softsign, Sigmoid and Softmax.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
(1) ELU(Exponential Linear Unit):
- can convert an input value(
x) to the output value betweenaex–aandx: *Memos:- If
x< 0, thenaex–awhile if 0 <=x, thenx. -
ais 1.0 by default basically.
- If
- is ELU() in PyTorch.
- ‘s pros:
- It normalizes negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- ‘s cons:
- It’s computationally expensive because of exponential operation.
- It’s non-differentiable at
x = 0ifais not 1.
- ‘s graph in Desmos:
(2) SELU(Scaled Exponential Linear Unit):
- can convert an input value(
x) to the output value betweenλ(aex– a) andλx: *Memos:- If
x< 0, thenλ(aex–a) while if 0 <=x, thenλx. - λ=1.0507009873554804934193349852946
- α=1.6732632423543772848170429916717
- If
- is SELU() in PyTorch.
- ‘s pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- ‘s cons:
- It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
- It’s computationally expensive because of exponential operation.
- It’s non-differentiable at
x = 0ifais not 1.
- ‘s graph in Desmos:
(3) CELU(Continuously Differentiable Exponential Linear Unit):
- is improved ELU, being able to differentiate at
x = 0even ifais not 1. - can convert an input value(
x) to the output value between aex/a– a andx: *Memos:- If
x< 0, thenaex/a–awhile if 0 <=x, thenx. -
ais 1.0 by default basically.
- If
- ‘s formula is:
- is CELU() in PyTorch.
- ‘s pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- ‘s cons:
- It’s computationally expensive because of exponential operation.
- ‘s graph in Desmos:
This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)



