BinaryCommitteeMachineRSGD.jl documentation
This package implements the Replicated Stochastic Gradient Descent algorithm for committee machines with binary weights described in the paper Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes by Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti and Riccardo Zecchina, Proc. Natl. Acad. Sci. U.S.A. 113: E7655-E7662 (2016), doi:10.1073/pnas.1608103113.
The package requires Julia 0.7
or later.
Installation
To install the module, use these commands from within Julia:
julia> using Pkg
julia> Pkg.clone("https://github.com/carlobaldassi/BinaryCommitteeMachineRSGD.jl")
Dependencies will be installed automatically.
Usage
The module is loaded as any other Julia module:
julia> using BinaryCommitteeMachineRSGD
The code basically provides a single function which generates a system of interacting replicated committee machines and tries to learn some patterns. The function and the patterns constructor are documented below.
Patterns(N, M)
Generates M
random ±1 patterns of length N
.
Patterns(ξ, σ)
Encapsulates the input patterns ξ
and their associated desired outputs σ
for use in replicatedSGD
. The inputs ξ
must be given as a vector of vectors, while the outputs σ
must be given as a vector. In both cases, they are converted to ±1 values using their sign (more precisely, using x > 0 ? 1 : -1
).
BinaryCommitteeMachineRSGD.replicatedSGD
— Function.replicatedSGD(patterns::Patterns; keywords...)
Runs the replicated Stochastic Gradient Descent algorithm over the given patterns
(see Patterns
). It automatically detects the size of the input and initializes a system of interacting binary committee machines which collectively try to learn the patterns.
The function returns three values: a Bool
with the success status, the number of epochs, and the minimum error achieved.
The available keyword arguments (note that the defaults are mostly not sensible, they must be collectively tuned):
K
(default=1
): number of hidden units for each committee machine (size of the hidden layer)y
(default=1
): number of replicasη
(default=2
): initial value of the step for the energy (loss) term gradientλ
(default=0.1
): initial value of the step for the interaction gradient (calledη′
in the paper)γ
(default=Inf
): initial value of the interaction strengthηfactor
(default=1
): factor used to updateη
after each epochλfactor
(default=1
): factor used to updateλ
after each epochγstep
(default=0.01
): additive step used to updateγ
after each epochbatch
(default=5
): minibatch sizeformula
(default=:simple
): used to choose the interaction update scheme whencenter=false
; see below for available valuesseed
(default=0
): random seed; if0
, it is not usedmax_epochs
(default=1000
): maximum number of epochsinit_equal
(default=true
): whether to initialize all replicated networks equallywaitcenter
(default=false
): whether to only exit successfully if the center replica has solved the problemcenter
(default=false
): whether to explicity use a central replica (iffalse
, it is traced out)outfile
(default=""
): name of a file where to output the results; if empty it's ignoredquiet
(default=false
): whether to output information on screen
The possible values of the formula
option are:
:simple
(the default): uses the simplest traced-out center formula (eq. (C7) in the paper):corrected
: applies the correction of eq. (C9) to the formula of eq. (C7):continuous
: version in which the center is continuous and traced-out:hard
: same as:simple
but uses a hard tanh, for improved performance
Example of a good parameter configuration (for a committee with K=5
and N*K=1605
synapses overall, working at α=M/(NK)=0.5
):
ok, epochs, minerr = replicatedSGD(Patterns(321, 802), K=5, y=7, batch=80, λ=0.75, γ=0.05, γstep=0.001, formula=:simple)