BinaryCommitteeMachineRSGD.jl documentation

This package implements the Replicated Stochastic Gradient Descent algorithm for committee machines with binary weights described in the paper Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes by Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti and Riccardo Zecchina, Proc. Natl. Acad. Sci. U.S.A. 113: E7655-E7662 (2016), doi:10.1073/pnas.1608103113.

The package requires Julia 0.7 or later.

Installation

To install the module, use these commands from within Julia:

julia> using Pkg

julia> Pkg.clone("https://github.com/carlobaldassi/BinaryCommitteeMachineRSGD.jl")

Dependencies will be installed automatically.

Usage

The module is loaded as any other Julia module:

julia> using BinaryCommitteeMachineRSGD

The code basically provides a single function which generates a system of interacting replicated committee machines and tries to learn some patterns. The function and the patterns constructor are documented below.

BinaryCommitteeMachineRSGD.Patterns — Type.

Patterns(N, M)

Generates M random ±1 patterns of length N.

Patterns(ξ, σ)

Encapsulates the input patterns ξ and their associated desired outputs σ for use in replicatedSGD. The inputs ξ must be given as a vector of vectors, while the outputs σ must be given as a vector. In both cases, they are converted to ±1 values using their sign (more precisely, using x > 0 ? 1 : -1).

source

BinaryCommitteeMachineRSGD.replicatedSGD — Function.

replicatedSGD(patterns::Patterns; keywords...)

Runs the replicated Stochastic Gradient Descent algorithm over the given patterns (see Patterns). It automatically detects the size of the input and initializes a system of interacting binary committee machines which collectively try to learn the patterns.

The function returns three values: a Bool with the success status, the number of epochs, and the minimum error achieved.

The available keyword arguments (note that the defaults are mostly not sensible, they must be collectively tuned):

K (default=1): number of hidden units for each committee machine (size of the hidden layer)
y (default=1): number of replicas
η (default=2): initial value of the step for the energy (loss) term gradient
λ (default=0.1): initial value of the step for the interaction gradient (called η′ in the paper)
γ (default=Inf): initial value of the interaction strength
ηfactor (default=1): factor used to update η after each epoch
λfactor (default=1): factor used to update λ after each epoch
γstep (default=0.01): additive step used to update γ after each epoch
batch (default=5): minibatch size
formula (default=:simple): used to choose the interaction update scheme when center=false; see below for available values
seed (default=0): random seed; if 0, it is not used
max_epochs (default=1000): maximum number of epochs
init_equal (default=true): whether to initialize all replicated networks equally
waitcenter (default=false): whether to only exit successfully if the center replica has solved the problem
center (default=false): whether to explicity use a central replica (if false, it is traced out)
outfile (default=""): name of a file where to output the results; if empty it's ignored
quiet (default=false): whether to output information on screen

The possible values of the formula option are:

:simple (the default): uses the simplest traced-out center formula (eq. (C7) in the paper)
:corrected: applies the correction of eq. (C9) to the formula of eq. (C7)
:continuous: version in which the center is continuous and traced-out
:hard: same as :simple but uses a hard tanh, for improved performance

Example of a good parameter configuration (for a committee with K=5 and N*K=1605 synapses overall, working at α=M/(NK)=0.5):

ok, epochs, minerr = replicatedSGD(Patterns(321, 802), K=5, y=7, batch=80, λ=0.75, γ=0.05, γstep=0.001, formula=:simple)

source