Activation functions in PyTorch (3)



This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)

Buy Me a Coffee☕

*Memos:

  • My post explains PReLU() and ELU().
  • My post explains SELU() and CELU().
  • My post explains Step function, Identity and ReLU.
  • My post explains Leaky ReLU, PReLU and FReLU.
  • My post explains GELU, Mish, SiLU and Softplus.
  • My post explains Tanh, Softsign, Sigmoid and Softmax.
  • My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.

(1) ELU(Exponential Linear Unit):

  • can convert an input value(x) to the output value between aexa and x: *Memos:
    • If x < 0, then aexa while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • is ELU() in PyTorch.
  • ‘s pros:
    • It normalizes negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • ‘s cons:
    • It’s computationally expensive because of exponential operation.
    • It’s non-differentiable at x = 0 if a is not 1.
  • ‘s graph in Desmos:

Image description

(2) SELU(Scaled Exponential Linear Unit):

  • can convert an input value(x) to the output value between λ(aex – a) and λx: *Memos:
    • If x < 0, then λ(aexa) while if 0 <= x, then λx.
    • λ=1.0507009873554804934193349852946
    • α=1.6732632423543772848170429916717
  • is SELU() in PyTorch.
  • ‘s pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • ‘s cons:
    • It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
    • It’s computationally expensive because of exponential operation.
    • It’s non-differentiable at x = 0 if a is not 1.
  • ‘s graph in Desmos:

Image description

(3) CELU(Continuously Differentiable Exponential Linear Unit):

  • is improved ELU, being able to differentiate at x = 0 even if a is not 1.
  • can convert an input value(x) to the output value between aex/a – a and x: *Memos:
    • If x < 0, then aex/aa while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • ‘s formula is: Image description
  • is CELU() in PyTorch.
  • ‘s pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • ‘s cons:
    • It’s computationally expensive because of exponential operation.
  • ‘s graph in Desmos:

Image description


This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)