Commit af0634b9 authored by MattiaPujatti's avatar MattiaPujatti
Browse files

updated JAX section

parent c4880a8c
......@@ -74,4 +74,5 @@
\@writefile{lot}{\contentsline {table}{\numberline {1.2}{\ignorespaces Recal of the most popular complex-valued activation functions.\relax }}{10}{table.caption.20}\protected@file@percent }
\newlabel{tab:cmplx_activations}{{1.2}{10}{Recal of the most popular complex-valued activation functions.\relax }{table.caption.20}{}}
\@writefile{toc}{\contentsline {section}{\numberline {1.5}JAX Implementation}{10}{section.1.5}\protected@file@percent }
\gdef \@abspage@last{10}
\@writefile{lof}{\contentsline {figure}{\numberline {1.5}{\ignorespaces JAX logo.\relax }}{11}{figure.caption.21}\protected@file@percent }
\gdef \@abspage@last{12}
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/Debian) (preloaded format=pdflatex 2021.6.3) 6 NOV 2021 11:45
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/Debian) (preloaded format=pdflatex 2021.6.3) 8 NOV 2021 11:26
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
......@@ -528,12 +528,19 @@ Package: enumitem 2019/06/20 v3.9 Customized lists
\enit@count@id=\count301
\enitdp@description=\count302
)
\c@theorem=\count303
\c@corollary=\count304
\c@definition=\count305
\c@observation=\count306
)))
(./extent.aux)
(/usr/share/texlive/texmf-dist/tex/latex/wrapfig/wrapfig.sty
\wrapoverhang=\dimen256
\WF@size=\dimen257
\c@WF@wrappedlines=\count303
\WF@box=\box57
\WF@everypar=\toks36
Package: wrapfig 2003/01/31 v 3.6
)
\c@theorem=\count304
\c@corollary=\count305
\c@definition=\count306
\c@observation=\count307
))) (./extent.aux)
\openout1 = `extent.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 5.
......@@ -552,7 +559,6 @@ LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 5.
LaTeX Font Info: ... okay on input line 5.
LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 5.
LaTeX Font Info: ... okay on input line 5.
*geometry* driver: auto-detecting
*geometry* detected driver: pdftex
*geometry* verbose mode - [ preamble ] result:
......@@ -589,17 +595,17 @@ LaTeX Font Info: ... okay on input line 5.
(/usr/share/texlive/texmf-dist/tex/context/base/mkii/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
\scratchcounter=\count307
\scratchdimen=\dimen256
\scratchbox=\box57
\nofMPsegments=\count308
\nofMParguments=\count309
\everyMPshowfont=\toks36
\MPscratchCnt=\count310
\MPscratchDim=\dimen257
\MPnumerator=\count311
\makeMPintoPDFobject=\count312
\everyMPtoPDFconversion=\toks37
\scratchcounter=\count308
\scratchdimen=\dimen258
\scratchbox=\box58
\nofMPsegments=\count309
\nofMParguments=\count310
\everyMPshowfont=\toks37
\MPscratchCnt=\count311
\MPscratchDim=\dimen259
\MPnumerator=\count312
\makeMPintoPDFobject=\count313
\everyMPtoPDFconversion=\toks38
) (/usr/share/texlive/texmf-dist/tex/latex/epstopdf-pkg/epstopdf-base.sty
Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
Package epstopdf-base Info: Redefining graphics rule for `.eps' on input line 4
......@@ -612,8 +618,9 @@ e
Package caption Info: Begin \AtBeginDocument code.
Package caption Info: hyperref package is loaded.
Package caption Info: listings package is loaded.
Package caption Info: wrapfig package is loaded.
Package caption Info: End \AtBeginDocument code.
\c@lstlisting=\count313
\c@lstlisting=\count314
Package hyperref Info: Link coloring OFF on input line 5.
(/usr/share/texlive/texmf-dist/tex/latex/hyperref/nameref.sty
......@@ -625,7 +632,7 @@ Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO)
(/usr/share/texlive/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty
Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
)
\c@section@level=\count314
\c@section@level=\count315
)
LaTeX Info: Redefining \ref on input line 5.
LaTeX Info: Redefining \pageref on input line 5.
......@@ -748,13 +755,18 @@ File: example-image-a.pdf Graphic file (type pdf)
Package pdftex.def Info: example-image-a.pdf used on input line 80.
(pdftex.def) Requested size: 160.59961pt x 120.44969pt.
Underfull \hbox (badness 10000) in paragraph at lines 85--86
Overfull \hbox (21.2961pt too wide) detected at line 104
[] \OT1/cmr/m/n/10.95 = [][] + [][]
[]
Overfull \hbox (21.2961pt too wide) detected at line 104
[] \OT1/cmr/m/n/10.95 = [][] + [][]
[]
Overfull \hbox (78.4074pt too wide) in paragraph at lines 93--105
[][]
Overfull \hbox (35.99998pt too wide) in alignment at lines 104--104
[][][]
[]
......@@ -824,7 +836,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
LaTeX Warning: Citation `trabelsi2018deep' on page 6 undefined on input line 14
7.
<..//pictures/complex_convolution.pdf, id=124, 199.56155pt x 251.21053pt>
<..//pictures/complex_convolution.pdf, id=120, 199.56155pt x 251.21053pt>
File: ..//pictures/complex_convolution.pdf Graphic file (type pdf)
<use ..//pictures/complex_convolution.pdf>
Package pdftex.def Info: ..//pictures/complex_convolution.pdf used on input li
......@@ -880,16 +892,6 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
[8]
LaTeX Warning: Citation `Virtue:EECS-2019-126' on page 9 undefined on input lin
e 223.
Overfull \hbox (2.51306pt too wide) in paragraph at lines 222--225
\OT1/cmr/m/n/10.95 Because of this, re-cently a new com-plex ac-ti-va-tion func
-tion have been pro-posed: the \OT1/cmtt/m/n/10.95 Complex Cardioid
[]
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
(fancyhdr) \setlength{\headheight}{13.59999pt}.
......@@ -899,6 +901,16 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
[9]
LaTeX Warning: Citation `Virtue:EECS-2019-126' on page 10 undefined on input li
ne 223.
Overfull \hbox (2.51306pt too wide) in paragraph at lines 222--225
\OT1/cmr/m/n/10.95 Because of this, re-cently a new com-plex ac-ti-va-tion func
-tion have been pro-posed: the \OT1/cmtt/m/n/10.95 Complex Cardioid
[]
LaTeX Warning: Citation `Nitta_complexBP' on page 10 undefined on input line 23
9.
......@@ -930,6 +942,33 @@ on input line 245.
LaTeX Warning: Citation `Virtue:EECS-2019-126' on page 10 undefined on input li
ne 246.
<..//pictures/JAX_logo.pdf, id=170, 639.38875pt x 406.51875pt>
File: ..//pictures/JAX_logo.pdf Graphic file (type pdf)
<use ..//pictures/JAX_logo.pdf>
Package pdftex.def Info: ..//pictures/JAX_logo.pdf used on input line 259.
(pdftex.def) Requested size: 241.84843pt x 153.76538pt.
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
(fancyhdr) \setlength{\headheight}{13.59999pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[10]
Overfull \hbox (2.60356pt too wide) in paragraph at lines 266--267
[]\OT1/cmr/m/n/10.95 many com-plex op-er-a-tions/lay-ers are al-ready
[]
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
(fancyhdr) \setlength{\headheight}{13.59999pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[11 <..//pictures/JAX_logo.pdf>]
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
......@@ -938,7 +977,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[10] (./extent.aux)
[12] (./extent.aux)
LaTeX Warning: There were undefined references.
......@@ -946,13 +985,13 @@ Package rerunfilecheck Info: File `extent.out' has not changed.
(rerunfilecheck) Checksum: 69418383BC20A3C5ADE2D66D57B72767;594.
)
Here is how much of TeX's memory you used:
13019 strings out of 479304
193829 string characters out of 5869780
549997 words of memory out of 5000000
29884 multiletter control sequences out of 15000+600000
13092 strings out of 479304
194928 string characters out of 5869780
552442 words of memory out of 5000000
29948 multiletter control sequences out of 15000+600000
416756 words of font info for 81 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
116i,16n,120p,978b,414s stack positions out of 5000i,500n,10000p,200000b,80000s
116i,15n,120p,1221b,453s stack positions out of 5000i,500n,10000p,200000b,80000s
{/usr/share/texmf/fonts/enc/dvips/cm-super/cm-super-ts1.enc}</usr/share/texli
ve/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb></usr/share/texlive/tex
mf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb></usr/share/texlive/texmf-dis
......@@ -968,19 +1007,18 @@ texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb></usr/share/texlive/
texmf-dist/fonts/type1/public/amsfonts/cm/cmr9.pfb></usr/share/texlive/texmf-di
st/fonts/type1/public/amsfonts/cm/cmsl10.pfb></usr/share/texlive/texmf-dist/fon
ts/type1/public/amsfonts/cm/cmsy10.pfb></usr/share/texlive/texmf-dist/fonts/typ
e1/public/amsfonts/cm/cmsy6.pfb></usr/share/texlive/texmf-dist/fonts/type1/publ
ic/amsfonts/cm/cmsy8.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsf
onts/cm/cmti10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/c
m/cmtt10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/doublestroke/dsr
om10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/doublestroke/dsrom8.
pfb></usr/share/texmf/fonts/type1/public/lm/lmss17.pfb></usr/share/texlive/texm
f-dist/fonts/type1/public/amsfonts/symbols/msam10.pfb></usr/share/texlive/texmf
-dist/fonts/type1/public/rsfs/rsfs10.pfb></usr/share/texmf/fonts/type1/public/c
m-super/sfrm1095.pfb>
Output written on extent.pdf (10 pages, 853390 bytes).
e1/public/amsfonts/cm/cmsy8.pfb></usr/share/texlive/texmf-dist/fonts/type1/publ
ic/amsfonts/cm/cmti10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/ams
fonts/cm/cmtt10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/doublestr
oke/dsrom10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/doublestroke/
dsrom8.pfb></usr/share/texmf/fonts/type1/public/lm/lmss17.pfb></usr/share/texli
ve/texmf-dist/fonts/type1/public/amsfonts/symbols/msam10.pfb></usr/share/texliv
e/texmf-dist/fonts/type1/public/rsfs/rsfs10.pfb></usr/share/texmf/fonts/type1/p
ublic/cm-super/sfrm1095.pfb>
Output written on extent.pdf (12 pages, 915652 bytes).
PDF statistics:
266 PDF objects out of 1000 (max. 8388607)
216 compressed objects within 3 object streams
44 named destinations out of 1000 (max. 500000)
85 words of extra memory for PDF output out of 10000 (max. 10000000)
289 PDF objects out of 1000 (max. 8388607)
235 compressed objects within 3 object streams
48 named destinations out of 1000 (max. 500000)
90 words of extra memory for PDF output out of 10000 (max. 10000000)
......@@ -9,7 +9,7 @@ From Wirtinger calculus backpropagation to specific software implementation chal
complex-valued neural network components and how they are related to existing real-valued network implementations. We will show how existing layers and functionalities can be extended to work also with a complex-valued input and which of them needs to be completely redefined.\\
We address the problem of re-adapting the training process building a complex backpropagation algorithm on top of many prior works, that allows for an optimization when the loss function is real-valued, thanks to Wirtinger calculus.\\
Furthermore, we will discuss in details the problem of building complex-valued activation functions, that was one of the main obstacles in the development of deep learning in this direction.\\
In the end, we will provide a brief presentation of the high level library, built on top of \JAX, that we have realized in order simplify the setup and train of those kind of networks. Nowadays, in fact, the internet is full of deep learning libraries implementing basically every kind of known model, with different optimization, parallelization, etc. However, for some reason, many of them still does not provide support to complex data types: a huge obstacle in the growth of complex-valued deep learning.
In the end, we will provide a brief presentation of the high level library, built on top of \JAX, that we have realized in order simplify the setup and train of those kind of networks. Nowadays, in fact, the internet is full of deep learning libraries implementing basically every kind of known model, with different optimizations, parallelizations, etc. However, for some reason, many of them still does not provide support to complex data types: a huge obstacle in the growth of complex-valued deep learning.
\section{Problems in the extent}
......@@ -24,16 +24,16 @@ Considering just their fundamental structure, complex-valued neural networks wor
\label{fig:cmplx_neuron}
\end{figure}
Each neuron receives a weighted input signal $\vb{z}$, that this time is complex valued (as the weights $\vb{w_i}$); this signal in summed up and added to a bias $\vb{b}$ and then passed through an activation function $f:\mathds{C}\to\mathds{C}$, that most of the times is non-linear. If we denote with the subscript $l$ the forward pass of a neuron in the $\ell$-th layer, then the output can be expressed with the following formula:
Each neuron receives a weighted input signal $\vb{z}$, that this time is complex valued (as the weights $\vb{w_i}$); this signal in summed up and added to a bias $\vb{b}$, and then passed through an activation function $f:\mathds{C}\to\mathds{C}$, that most of the times is non-linear. If we denote with the subscript $\ell$ the forward pass of a neuron in the $\ell$-th layer, then the output can be expressed with the following formula:
\[ \vb{y}_\ell = f_\ell\left(\sum_{i=1}^{N}\vb{w_i}z_i + \vb{b}_\ell\right) \]
where $N$ are the neurons in layer $\ell$, $M$ the neurons in layer $(\ell-1)$, $\vb{z_{\ell-1}}\in\mathds{C}^M$ was the output of the previous layer, $\vb{w}_\ell\in\mathds{C}^{N\times M}$ and $\vb{b}_\ell\in\mathds{C}^N$ are the learnable parameters of this level, $f_\ell$ the activation function and $\vb{y}_\ell\in\mathds{C}^N$ the effective output.\\
However, when considering a possible extension from $\mathds{R}$ to $\mathds{C}$, we need to take into account a few inconveniences, since we look for a coherent and rigorous framework.
\subsection*{Max operator undefined}
\subsection*{Max operator is undefined}
As explained also in the introductory mathematical section, $\mathds{C}$ is not an ordered field, in the sense that we cannot define a comparison relation among complex numbers that makes everybody agree. In principle you can define one, like the lexycographical ordering, that compares first the real part and only after the imaginary, or relying on establishing this relation among the magnitudes of those numbers. The latter is actually the preferred approach. This brief overview is important, since many non-linear functions in deep learning, like \texttt{ReLU} and \texttt{Max-Pooling} necessitate of a \textit{maximum} operation in order to fulfill their purpose of increasing numerical stability and dimensionality reduction, respectively.
\subsection*{Unstable Activations}
As we will see in a dedicated section, the problem of defining stable and coherent activation functions is one of the main issues that limited the development of complex-valued deep learning during the years. Complex functions, in fact, necessitate of further limitations to be suitable as activations: because of the Liouville's theorem \ref{th:Liouville}, for example, they can't be limited, and neither grow too slow, otherwise their derivative would always vanish during the backpropagation. So, simply re-adapting existing activations to support complex-valued inputs, maybe redefining ambiguous operations like \texttt{max}, is not enough, especially because you need care about the eventual loss of complex correlations if the activation applied independently on the real and imaginary components.
As we will see in a dedicated section, the problem of defining stable and coherent activation functions is one of the main issues that limited the development of complex-valued deep learning during the years. Complex functions, in fact, necessitate of further limitations to be suitable as activations: because of the Liouville's theorem \ref{th:Liouville}, for example, they can't be limited, and neither grow too slow, otherwise their derivative would always vanish during the backpropagation. So, simply re-adapting existing activations to support complex-valued inputs, maybe redefining ambiguous operations like \texttt{max}, is not enough, especially because you need to care about the eventual loss of complex correlations if the activation is applied independently on the real and imaginary components.
\subsection*{Lost Probabilistic Interpretation}
One nice property of real-valued neural network classifiers is the probabilistic interpretation that we can associate to its final layer, mainly due to the normalization in the range $[0,1]$ provided by sigmoid/softmax activation functions. But now, the final output of the network will be a set of complex numbers, that we cannot interpret anymore as a probability distribution over a set of probabilistic outcomes. This nice property can be partially recovered if we add a \textit{magnitude} layer just before the last activation: in this way we drop all the phase information but we move back to a real-valued problem. Anyway, it always depends on the final objective of the model.
......@@ -52,7 +52,7 @@ As anticipated in the introductory section, the interest of researchers in this
\item[-] he computed the derivatives $\partial f/\partial x$ and $\partial f/\partial y$, instead of relying on Wirtinger calculus \ref{eq:CR_derivs}; even if this is a working alternative to ours, we will see that it is suboptimal;
\item[-] he relied on "bad" activation functions, since, as told by he himself, many times the algorithm failed to converge.
\end{itemize}
I decided to report his work because it was still one of the first and working attempts to develop a complex backpropagation algorithm, but also because of the purely theoretical analysis realized on the transformation that a complex network can learn. Nitta, managed to teach its networks several transformations in $\mathds{R}^2$, like rotations, reductions and parallel displacements, that the corresponding real-valued model didn't make. He understood first that this was possible thanks to the higher degrees of freedom offered by complex multiplication (discussed in section \ref{subsec:cmplx_multiplication}). But what I believe it is even more interesting, is the relation that Nitta have found among complex-valued networks and the \textbf{Identity theorem} \ref{th:identity}:\\
I decided to report his work because it was still one of the first and working attempts to develop a complex backpropagation algorithm, but also because of the purely theoretical analysis realized on the transformations that a complex network can learn. Nitta, managed to teach its networks several transformations in $\mathds{R}^2$, like rotations, reductions and parallel displacements, that the corresponding real-valued model didn't make. He understood first that this was possible thanks to the higher degrees of freedom offered by complex multiplication (discussed in section \ref{subsec:cmplx_multiplication}). But what I believe it is even more interesting, is the relation that Nitta have found among complex-valued networks and the \textbf{Identity theorem} \ref{th:identity}:\\
\textit{``We believe
that Complex-BP networks satisfy the Identity Theorem, that is, Complex-BP networks can approximate complex
functions just by training them only over a part of the
......@@ -82,7 +82,7 @@ where $\alpha\in\mathds{R}$ is the learning rate.
\label{fig:cmplx_gradient_descent}
\end{figure}
In order to provide also a visual representation, in figure \ref{fig:cmplx_gradient_descent} we have considered a simple, non holomorphic, real-valued function like $f(z) = z\bar{z} = \norm{z}^2$, that has a unique global minimum at $z=0+0j$. We have then applied the gradient descent and ascent rules in both directions of the gradient, $\vb{\nabla_z}f$, and the cogradient $\vb{\nabla_{\bar{z}}}f$, in order to verify what said above. In the plot we clearly see that the only direction that approaches the true minimum (starting from a random point in the dominium of $f$) is exactly the one determined by the complex cogradient, while the complex gradient moves in a completely wrong direction. Also considering the ascent rules we observe that the steepest direction maximizing $f$ is again the one determined by the cogradient.\\
In order to provide also a visual representation, in figure \ref{fig:cmplx_gradient_descent} we have considered a simple, non holomorphic, real-valued function like $f(z) = z\bar{z} = \norm{z}^2$, that has a unique global minimum at $z=0+0j$. We have then applied the gradient descent and ascent rules in both directions of the gradient, $\vb{\nabla_z}f$, and the cogradient $\vb{\nabla_{\bar{z}}}f$, in order to verify what said above. In the plot we clearly see that the only direction that approaches the true minimum (starting from a random point in the dominium of $f$) is exactly the one determined by the complex cogradient, while the complex gradient moves in a completely wrong direction. Also considering the ascent rules we observe that the steepest direction maximizing $f$ is again the one determined by the cogradient.
\subsection{Backpropagation with a Real-valued Loss}
......@@ -90,18 +90,18 @@ With the real-valued loss assumption (proposed in \ref{sec:cmplx_backpropagation
\begin{table}[!ht]
\centering
\begin{tabular}{c c c}
\begin{tabularx}{\linewidth}{C{0.25\textwidth} C{0.3\textwidth} C{0.45\textwidth}}
\toprule
\textbf{Standard Real Calculus} & \textbf{Complex Calculus} & \textbf{Complex Calculus, assuming real-valued loss}\\
\textbf{Standard Real Calculus} & \textbf{Complex Calculus} & \textbf{Complex Calculus, \arraybackslash assuming real-valued loss}\\
\midrule
Input to layer $\ell +1$:\\
$\pdv{f_L}{x_\ell}$ & $\pdv{f_L}{z_\ell}$ and $\pdv{f_L}{\bar{z}_\ell}$ & $\pdv{f_L}{\bar{z}_\ell}$\\
Input to layer $\ell +1$:
\[ \pdv{f_L}{x_\ell} \] & \[\pdv{f_L}{z_\ell} \text{ and } \pdv{f_L}{\bar{z}_\ell}\] & \[ \pdv{f_L}{\bar{z}_\ell}\] \\
\midrule
Output from layer $\ell$:\\
$\pdv{f_L}{x_{\ell-1}} = \pdv{f_L}{x_\ell}\pdv{f_\ell}{x_{\ell-1}}$ & $\pdv{f_L}{z_{\ell-1}} = \pdv{f_L}{z_\ell}\pdv{f_\ell}{z_{\ell-1}} + \pdv{f_L}{\bar{z}_\ell}\bar{\left(\pdv{f_\ell}{\bar{z}_{\ell-1}}\right)}$\\
& $\pdv{f_L}{\bar{z}_{\ell-1}} = \pdv{f_L}{z_\ell}\pdv{f_\ell}{\bar{z}_{\ell-1}} + \pdv{f_L}{\bar{z}_\ell}\bar{\left(\pdv{f_\ell}{z_{\ell-1}}\right)}$ & $\pdv{f_L}{\bar{z}_\ell}\bar{\left(\pdv{f_\ell}{z_{\ell-1}}\right)}$ \\
Output from layer $\ell$:
\[ \pdv{f_L}{x_{\ell-1}} = \pdv{f_L}{x_\ell}\pdv{f_\ell}{x_{\ell-1}}\] & \[ \pdv{f_L}{z_{\ell-1}} = \pdv{f_L}{z_\ell}\pdv{f_\ell}{z_{\ell-1}} + \pdv{f_L}{\bar{z}_\ell}\overbar{\left(\pdv{f_\ell}{\bar{z}_{\ell-1}}\right)}\] \\
& \[ \pdv{f_L}{\bar{z}_{\ell-1}} = \pdv{f_L}{z_\ell}\pdv{f_\ell}{\bar{z}_{\ell-1}} + \pdv{f_L}{\bar{z}_\ell}\overbar{\left(\pdv{f_\ell}{z_{\ell-1}}\right)}\] & \[ \pdv{f_L}{\bar{z}_{\ell-1}} = \overbar{\left(\pdv{f_L}{\bar{z_\ell}}\right)}\pdv{f_\ell}{\bar{z}_{\ell-1}} + \pdv{f_L}{\bar{z}_\ell}\overbar{\pdv{f_\ell}{z_{\ell-1}}} \] \\
\bottomrule
\end{tabular}
\end{tabularx}
\caption{Comparison of backpropagation calculus. (source: \cite{Virtue:EECS-2019-126})}
\label{tab:comparison_backpropagation}
\end{table}
......@@ -252,7 +252,34 @@ Cardioid & $\frac{1}{2}\left(1 + \cos(\angle z)\right)z$ & \cite{Virtue:EECS-201
\section{JAX Implementation}
% specify also complex data types
From a practical perspective, in order to setup and realized all the studies and analysis we are going to present, I had to realize a dedicated \texttt{Python} library.\\
In the previous sections we have analyzed all the theoretical obstacles that researchers had to overcome in order to develop a working complex-valued deep learning framework. But, in reality, there are many drawbacks also at the implementation level: first of all the most popular hardware acceleration architectures, \texttt{CUDA} and \texttt{CuDNN} doesn't own a native support for complex-valued data types. And this is by itself a huge limitation, since we cannot train our networks efficiently, and we will have to rely on simple models with few parameters.\\
Regarding existing deep learning libraries, like \texttt{Keras}, \texttt{TensorFlow} and \texttt{Pytorch}, we have to say that they provide a lot of interesting high-level architectures to setup deep learning algorithms, and also they "officially" support complex inputs. Even better, they support complex derivatives. But when you try to implement an effective complex-valued network, you understand how much are they far from an effective comprehensive support (just some known issues: \href{https://github.com/pytorch/pytorch/issues/33152}{1}, \href{https://github.com/tensorflow/tensorflow/issues/17097}{2}, \href{https://github.com/microsoft/tensorflow-directml/issues/32}{3}): regardless of the many errors you can get from structures that should work, from an accurate analysis of the source code of the main layers, you notice that many operations are ambiguous or at least not compatible with the reasoning we adopted to develop the layers extents in this chapter. We could have probably found a way to redefine or override those structures, but we felt that modifying a so large and complex library, without getting undesired side effects, would have been too much work.
\begin{wrapfigure}{r}{0.5\textwidth}
\includegraphics[width=0.5\textwidth]{pictures/JAX_logo.pdf}
\caption{JAX logo.}
\end{wrapfigure}
For this reason we had to rely on a more niche library, called \texttt{JAX}, and recently developed by \texttt{Google DeepMind}. JAX is \href{https://github.com/hips/autograd}{\textbf{Autograd}} and \href{https://www.tensorflow.org/xla}{\textbf{XLA}} (a domain-specific compiler for linear algebra designed for TensorFlow models), brought together for high-performance numerical computing and machine learning research. It provides composable transformations of Python and NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more. The main advantages that we found using JAX were:
\begin{itemize}
\item[-] it supports and optimize complex differentiation, for holomorphic and non functions;
\item[-] it is extremely optimized, with XLA + JIT that partially compensate the lack of native hardware acceleration;
\item[-] many complex operations/layers are already supported and well defined.
\end{itemize}
More specifically, for building our complex-valued neural networks we will rely on \texttt{dm-haiku}, another library built on top of JAX with the purpose of covering the same role that \href{https://github.com/deepmind/sonnet}{\texttt{Sonnet}} (widely used ad DeepMind) has for Tensorflow, and to simplify the approach of users that are familiar with object oriented programming.\\
Since these are quite recent libraries, and also a definitive approach to complex-valued deep learning does not exists yet, I made a careful and complete analysis of the source code of Haiku and of the most important JAX functions. Thanks to this I effectively noticed that much work is still needed: even if in a small quantity respect to Tensorflow/Pytorch, also in this case many operations turns out to be ambiguous or bad-defined for complex-valued data types (e.g. the \texttt{square} or the \texttt{max} operators). Many of them are also completely undefined (e.g. initialization or, more in general, random complex distributions). We still decided to proceed with JAX mainly because of its flexibility, and many new functions/operations can be redefined without caring of implicit undesired side effects. This is possible also thanks to the design of the training loop, that is quite "explicit" and customizable.\\
From a practical point of view, I had basically to realize small a `\texttt{complex\_nn}' library on top of Haiku, containing the definitions (and re-definitions) of all the necessary components of a complex-valued neural network:
\begin{itemize}
\item \texttt{layers}: the adaptations derived before for linear, convolutional, pooling and normalization operations;
\item \texttt{activations}: all functions listed in table \ref{tab:cmplx_activations};
\item \texttt{initializers}: weights initializers following uniform or truncated random normal distributions, together with the modified approaches described above by Xavier and He;
\item \texttt{metrics}: categorical accuracy and categorical cross-entropy, useful for the successive classifiers designs;
\item \texttt{optimizers}: a \textit{complex-Adam} algorithm;
\item a \texttt{classifier wrapper} with the purpose of collecting all the necessary functions to setup a training loop with JAX and Haiku;
\item for completeness, also some utility functions, mainly to realize plots or wrapping more complex structures.
\end{itemize}
This code has been tested over several datasets. There are in fact a few \texttt{Jupyter Notebooks} providing a detailed explanation of analysis we are going to see, together with some basic setup of some learning procedures.\\
All the implementation and complete analysis is actually available at my gitlab page \footnote{\href{https://gitlab.fbk.eu/mpujatti/complex-valued-deep-learning-for-condition-monitoring}{https://gitlab.fbk.eu/mpujatti/complex-valued-deep-learning-for-condition-monitoring}}.
\end{document}
\ No newline at end of file
......@@ -114,7 +114,10 @@
\citation{Virtue:EECS-2019-126}
\citation{DBLP:journals/corr/ArjovskySB15}
\citation{Virtue:EECS-2019-126}
\@writefile{lot}{\contentsline {table}{\numberline {3.2}{\ignorespaces Recal of the most popular complex-valued activation functions.\relax }}{xxvi}{table.caption.28}\protected@file@percent }
\newlabel{tab:cmplx_activations}{{3.2}{xxvi}{Recal of the most popular complex-valued activation functions.\relax }{table.caption.28}{}}
\@writefile{toc}{\contentsline {section}{\numberline {3.5}JAX Implementation}{xxvi}{section.3.5}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {3.5}{\ignorespaces JAX logo.\relax }}{xxvii}{figure.caption.29}\protected@file@percent }
\citation{*}
\bibdata{bibliography}
\bibcite{trabelsi2018deep}{1}
......@@ -152,16 +155,16 @@
\citation{he2015delving}
\citation{xavier_init}
\citation{he2015delving}
\@writefile{toc}{\contentsline {chapter}{\numberline {A}Mathematical Proofs}{xxix}{appendix.A}\protected@file@percent }
\@writefile{toc}{\contentsline {chapter}{\numberline {A}Mathematical Proofs}{xxxi}{appendix.A}\protected@file@percent }
\@writefile{lof}{\addvspace {10\p@ }}
\@writefile{lot}{\addvspace {10\p@ }}
\newlabel{app:cmplx_optim}{{A}{xxix}{Mathematical Proofs}{appendix.A}{}}
\@writefile{toc}{\contentsline {section}{\numberline {A.1}Complex Weights Initialization \cite {trabelsi2018deep}}{xxix}{section.A.1}\protected@file@percent }
\newlabel{app:weight_init}{{A.1}{xxix}{Complex Weights Initialization \cite {trabelsi2018deep}}{section.A.1}{}}
\newlabel{app:cmplx_optim}{{A}{xxxi}{Mathematical Proofs}{appendix.A}{}}
\@writefile{toc}{\contentsline {section}{\numberline {A.1}Complex Weights Initialization \cite {trabelsi2018deep}}{xxxi}{section.A.1}\protected@file@percent }
\newlabel{app:weight_init}{{A.1}{xxxi}{Complex Weights Initialization \cite {trabelsi2018deep}}{section.A.1}{}}
\citation{Messerschmitt_stationary_points}
\citation{MESSERSCHMITT_STATIONARY_POINTS}
\@writefile{toc}{\contentsline {section}{\numberline {A.2}Stationary points of a real-valued function of a complex variable \cite {Messerschmitt_stationary_points}}{xxx}{section.A.2}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {A.2}Stationary points of a real-valued function of a complex variable \cite {Messerschmitt_stationary_points}}{xxxii}{section.A.2}\protected@file@percent }
\citation{Hualiang_nonlinear}
\citation{HUALIANG_NONLINEAR}
\@writefile{toc}{\contentsline {section}{\numberline {A.3}Steepest complex gradient descent \cite {Hualiang_nonlinear}}{xxxi}{section.A.3}\protected@file@percent }
\gdef \@abspage@last{31}
\@writefile{toc}{\contentsline {section}{\numberline {A.3}Steepest complex gradient descent \cite {Hualiang_nonlinear}}{xxxiii}{section.A.3}\protected@file@percent }
\gdef \@abspage@last{33}
......@@ -3,12 +3,12 @@ Capacity: max_strings=200000, hash_size=200000, hash_prime=170003
The top-level auxiliary file: main.aux
The style file: ieeetr.bst
Case mismatch error between cite keys MESSERSCHMITT_STATIONARY_POINTS and Messerschmitt_stationary_points
---line 133 of file main.aux
---line 165 of file main.aux
: \citation{MESSERSCHMITT_STATIONARY_POINTS
: }
I'm skipping whatever remains of this command
Case mismatch error between cite keys HUALIANG_NONLINEAR and Hualiang_nonlinear
---line 136 of file main.aux
---line 168 of file main.aux
: \citation{HUALIANG_NONLINEAR
: }
I'm skipping whatever remains of this command
......
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/Debian) (preloaded format=pdflatex 2021.6.3) 6 NOV 2021 11:08
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/Debian) (preloaded format=pdflatex 2021.6.3) 8 NOV 2021 11:27
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
......@@ -522,10 +522,18 @@ Package: enumitem 2019/06/20 v3.9 Customized lists
\enit@count@id=\count301
\enitdp@description=\count302
)
\c@theorem=\count303
\c@corollary=\count304
\c@definition=\count305
\c@observation=\count306
(/usr/share/texlive/texmf-dist/tex/latex/wrapfig/wrapfig.sty
\wrapoverhang=\dimen256
\WF@size=\dimen257
\c@WF@wrappedlines=\count303
\WF@box=\box57
\WF@everypar=\toks36
Package: wrapfig 2003/01/31 v 3.6
)
\c@theorem=\count304
\c@corollary=\count305
\c@definition=\count306
\c@observation=\count307
) (./main.aux)
\openout1 = `main.aux'.
......@@ -582,17 +590,17 @@ LaTeX Font Info: ... okay on input line 9.
(/usr/share/texlive/texmf-dist/tex/context/base/mkii/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
\scratchcounter=\count307
\scratchdimen=\dimen256
\scratchbox=\box57
\nofMPsegments=\count308
\nofMParguments=\count309
\everyMPshowfont=\toks36
\MPscratchCnt=\count310
\MPscratchDim=\dimen257
\MPnumerator=\count311
\makeMPintoPDFobject=\count312
\everyMPtoPDFconversion=\toks37
\scratchcounter=\count308
\scratchdimen=\dimen258
\scratchbox=\box58
\nofMPsegments=\count309
\nofMParguments=\count310
\everyMPshowfont=\toks37
\MPscratchCnt=\count311
\MPscratchDim=\dimen259
\MPnumerator=\count312
\makeMPintoPDFobject=\count313
\everyMPtoPDFconversion=\toks38
) (/usr/share/texlive/texmf-dist/tex/latex/epstopdf-pkg/epstopdf-base.sty
Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
Package epstopdf-base Info: Redefining graphics rule for `.eps' on input line 4
......@@ -605,8 +613,9 @@ e
Package caption Info: Begin \AtBeginDocument code.
Package caption Info: hyperref package is loaded.
Package caption Info: listings package is loaded.
Package caption Info: wrapfig package is loaded.
Package caption Info: End \AtBeginDocument code.
\c@lstlisting=\count313
\c@lstlisting=\count314
Package hyperref Info: Link coloring OFF on input line 9.
(/usr/share/texlive/texmf-dist/tex/latex/hyperref/nameref.sty
......@@ -618,7 +627,7 @@ Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO)
(/usr/share/texlive/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty
Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
)
\c@section@level=\count314
\c@section@level=\count315
)
LaTeX Info: Redefining \ref on input line 9.
LaTeX Info: Redefining \pageref on input line 9.
......@@ -687,17 +696,17 @@ Overfull \hbox (3.40668pt too wide) detected at line 21
Overfull \hbox (3.40668pt too wide) detected at line 23
[]\OT1/cmr/m/n/10.95 xxix
[]\OT1/cmr/m/n/10.95 xxxi
[]
Overfull \hbox (0.365pt too wide) detected at line 24
[]\OT1/cmr/m/n/10.95 xxx
Overfull \hbox (6.44835pt too wide) detected at line 24
[]\OT1/cmr/m/n/10.95 xxxii
[]
Overfull \hbox (3.40668pt too wide) detected at line 25
[]\OT1/cmr/m/n/10.95 xxxi
Overfull \hbox (9.49002pt too wide) detected at line 25
[]\OT1/cmr/m/n/10.95 xxxiii
[]
)
......@@ -854,13 +863,18 @@ File: example-image-a.pdf Graphic file (type pdf)
Package pdftex.def Info: example-image-a.pdf used on input line 80.
(pdftex.def) Requested size: 160.59961pt x 120.44969pt.
Underfull \hbox (badness 10000) in paragraph at lines 85--86
Overfull \hbox (21.2961pt too wide) detected at line 104
[] \OT1/cmr/m/n/10.95 = [][] + [][]
[]
Overfull \hbox (21.2961pt too wide) detected at line 104
[] \OT1/cmr/m/n/10.95 = [][] + [][]
[]
Overfull \hbox (78.4074pt too wide) in paragraph at lines 93--105
[][]
Overfull \hbox (35.99998pt too wide) in alignment at lines 104--104
[][][]
[]
<pictures/pass_cmplx_layer.pdf, id=333, 367.1316pt x 88.45848pt>
......@@ -888,7 +902,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[21]
<pictures/complex_convolution.pdf, id=381, 199.56155pt x 251.21053pt>
<pictures/complex_convolution.pdf, id=379, 199.56155pt x 251.21053pt>
File: pictures/complex_convolution.pdf Graphic file (type pdf)
<use pictures/complex_convolution.pdf>
Package pdftex.def Info: pictures/complex_convolution.pdf used on input line 1
......@@ -925,11 +939,38 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[24]
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
(fancyhdr) \setlength{\headheight}{13.59999pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[25]
Overfull \hbox (2.51306pt too wide) in paragraph at lines 222--225
\OT1/cmr/m/n/10.95 Because of this, re-cently a new com-plex ac-ti-va-tion func
-tion have been pro-posed: the \OT1/cmtt/m/n/10.95 Complex Cardioid
[]
<pictures/JAX_logo.pdf, id=440, 639.38875pt x 406.51875pt>
File: pictures/JAX_logo.pdf Graphic file (type pdf)
<use pictures/JAX_logo.pdf>
Package pdftex.def Info: pictures/JAX_logo.pdf used on input line 259.
(pdftex.def) Requested size: 241.84843pt x 153.76538pt.
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
(fancyhdr) \setlength{\headheight}{13.59999pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[26]
Overfull \hbox (2.60356pt too wide) in paragraph at lines 266--267
[]\OT1/cmr/m/n/10.95 many com-plex op-er-a-tions/lay-ers are al-ready
[]
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
......@@ -938,7 +979,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[25])
[27 <./pictures/JAX_logo.pdf>])
Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) Make it at least 13.59999pt, for example:
......@@ -947,7 +988,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[26] (./main.bbl [27
[28] (./main.bbl [29
])
......@@ -958,7 +999,7 @@ Package fancyhdr Warning: \headheight is too small (12.0pt):
(fancyhdr) \addtolength{\topmargin}{-1.59999pt}.
[28] (./chapters/appendix.tex
[30] (./chapters/appendix.tex
Appendix A.
Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
......@@ -969,7 +1010,7 @@ Underfull \hbox (badness 10000) in paragraph at lines 25--28
[]
[29
[31
]
...