This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

About this document

This document was created using Weave.jl. The code is available in on github. The same document generates both static webpages and associated jupyter notebook.

Introduction

Previous notes have covered single layer, multi layer, and convolutional feed forward networks. In feed forward networks, the outputs of one layer are fed into the next layer, always moving toward the output. Recurrent networks break this pattern. In recurrent networks, outputs of one layer are feed back into the same. This always the network to maintain a hidden state. Recurrent networks are typically used to model sequential data. There are many applications to time series. Recurrent networks are also useful for processing text and audio data.

Additional Reading

  • @goodfellow2016 Deep Learning especially chapter 10
  • Knet.jl documentation especially the textbook
  • @klok2019 Statistics with Julia:Fundamentals for Data Science, MachineLearning and Artificial Intelligence

Recurrent Networks

Recurrent Networks are designed to predict a sequence of outputs, $y_t$, given a sequence of inputs, $x_t$, where $t=1, …,T$, The relationship between $x$ and $y$ is assumed to be stationary, but we will allow there to be possibly many values from the history of $x$ to affect $y$. We do this by introducing a hidden state, $h_t$. The prediction for $y_t$ is only a function of $h_t$, say $\hat{y}(h_t)$. The hidden state is Markovian with Both $\hat{y}()$ and $f()$ are constructed from neural networks. They could simply be single layer perceptrons, or any of the more complicated network architectures we previously discussed.

Approximation Ability

Recurrent networks can approximate (in fact can equal) any computable function. @siegelmann1991 and @siegelmann1992 show that recurrent neural networks are Turing complete. As with the universal approximation ability of feed forward networks, this result is good to know, but it is not an explanation for the good practical performance of recurrent networks.

When $h_t$ is large enough, it is easy to see how the recurrent model above can equal familiar time series econometric models. For example, for an AR(P) model, To express this model in recurrent state-space form, let $x_t = y_{t-1}$, and $h_t = (y_{t-1}, \cdots, y_{t-p}) \in \R^p$. Then we can set and

Stability and Gradients

Recursive neural networks can be difficult to train. The difficulty stems from how the gradient of the network behaves very differently depending on whether the dynamics are stable. To illustrute, suppose $f()$ is linear, and the loss function is MSE The derivatives of the loss function with respect to the parameters of $f$ are then: Both of these involve increasing powers of $f_h^t$. If $h_t$ has stable dynamics, i.e. $|f_h|<1$, then these derivatives will be dominated by the terms involving more recent values of $x_t$. If $h_t$ has explosive dynamics, $|f_h|>1$, then these derivatives will be dominated by the terms involving the earlist $x_t$. Depending on the stability of $f$, gradients will be dominated by either short term dependence between $x$ and $y$ or long term. This behavior makes it difficult to train a network where both short and long term dependencies are important.

The previous analysis also apply to nonlinear $f()$, with $f_h$ replaced by $(\partial f)/(\partial h)$, and stable replaced with locally stable.

The previous analysis also applies to multivariate $h_t$ with $|f_h|$ replace by $\max |eigenvalue(f_h)|$.

Truncating Gradients

A practical problem with gradients of recurrent networks is that $\hat{y}(h_t)$ depends on the entire history of $x_1, \cdots, x_t$. When computing the gradient by backward differentiation, this entire history will accumulate, using up memory and taking time. A common solution is to truncate the gradient calculation after some fixed number of periods.

LSTM

Long Short-Term Memory networks were designed to avoid the problem of vanishing and exploding gradients. LSTMs have an additional hiddent state, $s_t$. The extra hidden state is $s_t \in (0,1)$ and is a weighted sum of $s_{t-1}$ and other variables. In particular, The first $\sigma(b_f + U_f’ x_t + W_f’ h_{t-1})$ is a “forget” gate. It determines how much of $s_{t-1}$ is forgotten. The second $\sigma(b_g + U_g’ x_t + W_g’ h_{t-1})$ is call the external input gate. It determines how much current $x_t$ affects $s_t$. The $\tilde{x}$ is a rescaled input given by Finally, $h_t$ is a gated and transformed version of $s_t$. where $\sigma(b_o + U_o’ x_t + W_o’h_t)$ is the output gate.

Example : Generating Dylan Songs

Recurrent neural networks are pretty good at randomly generating text. The Flux model zoo includes one such example. The example is based on this blog post by Andrej Karpathy. It predicts each individual character given past characters. This works suprisingly well. We are going to repeat this exercise, but use Bob Dylan songs as input.

Downloading Songs

We download all Bob Dylan lyrics and chords from dylanchords.info.

using ProgressMeter, JLD2
import HTTP, Gumbo, Cascadia

infile = joinpath(docdir,"jmd","dylanchords.txt")

if !isfile(infile)
  r=HTTP.get("http://dylanchords.info/alphabetical_list_of_songs.htm")
  songlist=Gumbo.parsehtml(String(r.body));
  songlinks = eachmatch(Cascadia.Selector(".songlink"), songlist.root)
  songhtml = Array{String, 1}(undef, length(songlinks))
  p = Progress(length(songlinks),1,"Downloading songs", 50)
  for s ∈ eachindex(songlinks)
    url = songlinks[s].attributes["href"]
    if url == "index.htm"
      songhtml[s] = ""
      continue
    end
    r = HTTP.get("http://dylanchords.info/"*url)
    songhtml[s]=String(r.body)
    next!(p)
  end

  open(infile, "w") do io
    for s ∈ songhtml
      write(io, s)
      write(io,"\n")
    end
  end
end

text = collect(String(read(infile)))
2873103-element Vector{Char}:
 '\n': ASCII/Unicode U+000A (category Cc: Other, control)
 '<': ASCII/Unicode U+003C (category Sm: Symbol, math)
 '?': ASCII/Unicode U+003F (category Po: Punctuation, other)
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
 'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 'v': ASCII/Unicode U+0076 (category Ll: Letter, lowercase)
 'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
 'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)
 ⋮
 '<': ASCII/Unicode U+003C (category Sm: Symbol, math)
 '/': ASCII/Unicode U+002F (category Po: Punctuation, other)
 'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)
 't': ASCII/Unicode U+0074 (category Ll: Letter, lowercase)
 'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
 'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)
 '>': ASCII/Unicode U+003E (category Sm: Symbol, math)
 '\n': ASCII/Unicode U+000A (category Cc: Other, control)
 '\n': ASCII/Unicode U+000A (category Cc: Other, control)

Note that the input text here are html files. Here is the start of one song.

<head>
<title>My Back Pages</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>

<body>

<h1 class="songtitle">My Back Pages</h1>


<p>Words and music Bob Dylan<br />
Released on <a class="recordlink" href="../04_anotherside/index.htm">Another Side Of Bob Dylan</a> (1964) and <a class="recordlink" href="../99_greatesthits2/index.htm">Greatest Hits II</a> (1971)<br />
Tabbed by Eyolf &Oslash;strem</p>

<p>Most G's are played with a small figure (G - G6 - G7) going up to G7:</p>
<pre class="chords">
G  320003
G6 322003
G7 323003
</pre>

<p>This is noted with a *).</p>

<p>He didn't seem to spend too much time rehearsing this song before he
went into the studio (the whole album was recorded in one
evening/night session) &ndash; he gets the first verse all wrong in the
chords, and he struggles a lot with the final lines of each
verse. I've written out the chords for the first two verses and in the
following verses deviations from the <em>second</em> verse.</p>

<p>Capo 3rd fret (original key Eb major)</p>

<hr />

<pre class="verse">
C       Am          Em
Crimson flames tied through my ears
        F        G *)   C
Rollin' high and mighty traps
C            Am      Em      C
Pounced with fire on flaming roads
      F     Em    G   *)
Using ideas as my maps
       F       Am     G *)        C
&quot;We'll meet on edges, soon,&quot; said I
Am                  F G
Proud 'neath heated brow
        C             Am    C
Ah, but I was so much older then
    F       G *)      C       G *)
I'm younger than that now.

Some songs include snippets of tablature (simple notation for guitar). For example,

<p>The easiest way to play the G7sus4 G7 G7sus2 G7 figure would be:</p>
<pre class="verse">
G7sus4  G7  G7sus2  G7
|-1-----1-----1-----1---
|-0-----0-----0-----0---
|-0-----0-----0-----0---
|-0-----0-----0-----0---
|-3-----2-----0-----2---
|-3-----3-----3-----3---
</pre>

<hr />

<p>Intro:</p>
<pre class="tab">
  C           G/b           F/a         G11   G       C/e
  :     .       :     .       :     .       :     .        :     .
|-------0-----|-------3-----|-------1-----|--------------|-------0------
|-----1---1---|-----0-------|-----1-1---1-|---1---010----|-----1---1----
|---0-------0-|---0-----0---|---2-----1---|-2---2----0---|---0-------0-- etc
|-------------|-------------|-------------|------------3-|-2------------
|-3-----------|-2---------2-|-0-----------|--------------|--------------
|-------------|-------------|-------------|-3------------|--------------
</pre>

This is all just text, and we will treat it is a such. However, it has additional structure that makes it more interesting to predict than the text of just lyrics.

Markovian Baseline

As Yoav Goldberg point out, you can generate pretty good text with a simple Markovian model of characters. That is, estimate the probability of a character $c$ given a history of $L$ characters $h$, $P(c_t|c_{t-1}, …, c_{t-L})$, by simple sample averages. Let’s try this out.

using StaticArrays

function p_markov(len::Val{L}, data::AbstractVector{Char}) where L
  dm = Dict{SVector{L, Char}, Dict{Char, Float64}}()
  p = Progress(length(data), 1, "count_markov($L)", 30)
  for t in (1+L):length(data)
    key = @view data[(t-L):(t-1)]
    entry=get!(dm, key, Dict(data[t] => 0))
    v = get!(entry, data[t], 0)
    entry[data[t]] += 1
    next!(p)
  end
  for k in keys(dm)
    total = sum(values(dm[k]))
    for e in keys(dm[k])
      dm[k][e] /= total
    end
  end
  dm
end

modelfile=joinpath(docdir,"jmd","models","dylan-markov4.jld2")
if isfile(modelfile)
  @load modelfile dm
else
  @time dm = p_markov(Val(4), text);
  @save modelfile dm
end
1-element Vector{Symbol}:
 :dm

The above code stores $P(c_t|c_{t-1},…,c_{t-L})$ in a dictionary. When $L$ is large, there are huge number of possible histories, $c_{t-1},…,c_{t-L}$, and we will not observe many of them. A dictionary only stores data on the histories we observe, so it will save some memory.

Let’s now sample from our model.

defaultinit=collect("\n\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html lang=\"en\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\">\n\n<head>\n<title>")

function sample_markov(dm::Dict{SVector{L, Char}, Dict{Char, Float64}}, len=1000,
                       init=defaultinit) where L
  out = Array{Char,1}(undef,len)
  state = MVector{L, Char}(init[(end-L+1):end])
  out[1:L] .= state
  for s=L+1:len
    u = rand()
    cp = 0.0
    for k in keys(dm[state])
      cp += dm[state][k]
      if (u<= cp)
        out[s]=k
        break
      end
    end
    state[1:(end-1)] .= state[2:end]
    state[end] = out[s]
  end
  out
end

@show length(dm), length(text)
println(String(sample_markov(dm)))
(length(dm), length(text)) = (88032, 2873103)
tle>
<link">Greathere in the fixer see it up a hole like too late.
</pre>
<pre class="verse:</p>
<pre class="bridge">
      C/g     G     799877
A     C
  :   .   .    G6/b  G#m
All nighting and a show treble, why, but the exposed
One more
I water Hotel
whateverything yet by Bob liked on <a class="songs I tried fret
|----|--------------------------------------------------------|
-------------10---0-0-0-0---0-0------------------|---0-0---0------1-1-|
-------7-0-----------|----3---3---0---|
|-------4-----|--------1-|--------
</pre>

<h1 class="version="1.0" encoding key. The was man wait.
</pre>

<?xml verse">
G                    G
With that Dylan.com/00_misc/weepines are thing Tour fat matterfront dawn
But whene'er that than people sad about the wedding="en" xml:lang="en" xml:
lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">

<pre>
</body></html">

<p>Dsus2  Em              And the horse
I wouldn't goodbye Royal Califormed the Lord
In on the to Puerto Recordlink rel="styles

Conditioning on histories of length 4, we get some hints of Dylan-esque lyrics, but we also get a lot of gibberish. Let’s try longer histories.

Length 10

modelfile=joinpath(docdir,"jmd","models","dylan-markov10.jld2")
if isfile(modelfile)
  @load modelfile dm
else
  @time dm = p_markov(Val(10), text);
  @save modelfile dm
end
@show length(dm), length(text)
println(String(sample_markov(dm)))
(length(dm), length(text)) = (930264, 2873103)
d>
<title>Golden Vanity</h1>


<p>Written by Baker Knight, recorded by Bob Dylan on <a class="refrain">
Say hello to Valery,
say hello to Valery,
say hello to Mary Anne
Say I'm still on the range of the law could not realize
That they're dying like a drum
I don't know what I'm about to break
And righteous, yes it makes no sense in a better
world. I don't exist
    C        D
Had no English words for me
</pre>

<pre class="verse">
So swiftly the sun sinkin' like a fool.

When they asked him who was responsible for poisoning him with care.

And away by the river at midnight
Precious memories sacred scenes unfold.
</pre>
</body></html>

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">

<head>
<title>Hey La La</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>

<body>

<h1 class="songversion">Carnegie Chap

Length 20

modelfile=joinpath(docdir,"jmd","models","dylan-markov20.jld2")
if isfile(modelfile)
  @load modelfile dm
else
  @time dm = p_markov(Val(20), text);
  @save modelfile dm
end
@show length(dm), length(text)
println(String(sample_markov(dm, 2000)))
(length(dm), length(text)) = (1522834, 2873103)
ml">

<head>
<title>I Am A Lonesome Hobo</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>

<body>

<h1 class="songtitle">Clothes Line Saga</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>

<body>

<h1 class="songtitle">Summer Days</h1>


<p>Words and music Bob Dylan<br />
Released on <a class="recordlink" href="../28_biograph/index.htm">Biograph<
/a> (1985)
and in an early version on <a class="recordlink" href="../28_biograph/index
.htm">Biograph</a> (1985)<br />
Tabbed by Eyolf &Oslash;strem</p>

<hr />

<pre class="verse">
      C                G     *)      |-------------|-----------------|-----
------------|-----0--------------
|--------------------
|-0h2-2-2-2-2-2--/7-5-------------|-2---------------|-1---------------|-0--
-------------|
|-----------------|-----------------|
------------|--------------------------------|
|---------0-------|-0-------0-------|-2-----------2-----------|
|-2---------------|-1-------
</pre>

<pre class="refrain">
Hey! Mr. Tambourine Man, play a song for me,
I'm not sleepy and there is no place I'm going to.
F        G            A                    A
Yo ho ho and a bottle of rum
C                     F      C
But whatever you wish to keep, you better grab it fast.
Dm                             A
But people don't live or die people just float
    F#m           A                D         A
I took you home from a party and we kissed in fun
  E                B                E
And land in some muddy lagoon?
                    -------------------
|---------------------2-|--------------------3-------|
|-------------5----(4)----|-----------------|-----
|-----------------|----------------------|-----------------|---------------
--|
|-----------5---3-|---------------3-|-----------------|--------(99999)--|
|-----------0-----|(2)--------0-----|(2)--------0-----|
|-----3-------3---|-----3-------3---|-----3-------3---|
|-/4---------------4-------|
|-------4-----4---|-----7-4-7

With histories of length 20 the text looks pretty. Some of the lyrics are recognizably Dylan-like. However, the model still gets html tags mostly wrong. More importantly, the model is effectively just combining phrases of Dylan lyrics randomly. The data here consists of nearly 2.9 million characters. Among these, there are 1.5 million unique sequences of 20 characters. Many of the estimated $P(c_t|c_{t-1}, …)$ are equal to one.

RNN

Now let’s fit a recurrent neural network to the Dylan lyrics and chords data.

using Flux
using Flux: onehot, chunk, batchseq, throttle, logitcrossentropy
using StatsBase: wsample
using Base.Iterators: partition
using ProgressMeter

Recurrence and State

Recurrent neural networks have an internal state. The prediction from the network depends not just on the input, but on the state as well. The higher level interface to Flux hides the internal state. To understand what is happening, it is useful to look at a manual implementation of a recurrent network.

# RNN with dense output layer
nstate = 3
nx = 2
Wxs = randn(nstate,nx)
Wss = randn(nstate,nstate)
Wsy = randn(1,nstate)
b = randn(nstate)
bo = randn(1)
# equivalent to m = Chain(RNN(nx, nstate, tanh), Dense(nstate,1))
module Demo # put in a module so we can redefine struc without restarting Julia
struct RNNDense{M, V, V0}
  Wxs::M
  Wss::M
  Wsy::M
  b::V
  bo::V
  state0::V0
end

function (r::RNNDense)(state, x)
  state = tanh.(r.Wxs*x .+ r.Wss*state .+ r.b)
  out = r.Wsy*state .+ r.bo
  return(state, out)
end
end

rnnd = Demo.RNNDense(Wxs, Wss, Wsy, b, bo, zeros(nstate))
state = zeros(nstate)
m = Flux.Recur(rnnd, state)

# usage
x = randn(10,nx)
pred = zeros(size(x,1))
Flux.reset!(m)
for i in 1:size(x,1)
  pred[i] = m(x[i,:])[1]
  println(m.state)
end
Flux.reset!(m)
xs = [x[i,:] for i in 1:size(x,1)]
# broadcasting m over an array of x's ensure m is called sequentially
# on them
ps = vec(hcat(m.(xs)...))
ps ≈ pred
[0.9999627585819618, -0.9999950870293877, -0.9999325176311454]
[0.9969289926939939, -0.9592843010898435, -0.9949685803229465]
[-0.9550874475436307, 0.9456978854767997, -0.30563757015795245]
[0.9223339918617945, -0.9999235979878388, -0.999971432603519]
[0.027959415346625084, 0.7641638044341819, -0.6014126623233512]
[-0.6016054748224411, -0.9992448996719636, -0.9999688939886654]
[0.9399256650670179, -0.9999838618472047, -0.9999990500263781]
[0.7193737938828986, 0.3541220126785839, -0.8183756591323428]
[0.1132312523783616, 0.7038207489218203, -0.686534857913229]
[0.995712302587801, -0.9952508631539942, -0.999281033271]
true

Now let’s fit an RNN to Dylan lyrics.

Data Preparation

text = collect(String(read(joinpath(docdir,"jmd","dylanchords.txt"))))
endchar = 'Ω' # any character not in original text
alphabet = [unique(text)..., endchar]
hottext = map(ch -> onehot(ch, alphabet), text)
stop = onehot(endchar, alphabet)

N = length(alphabet)
batchseqlen = 50
seqperbatch = 50
Xseq = collect(partition((batchseq((chunk(hottext,seqperbatch)),stop)), batchseqlen));
Yseq = collect(partition((batchseq((chunk(hottext[2:end], seqperbatch)),stop)),
                         batchseqlen));
println("$(length(Xseq)) batches")
data = zip(Xseq, Yseq);
1150 batches

To reduce computation while training the model, we are going to use gradient truncation. batchseqlen is the length of history through which gradients are accumulated.

We also divide the data into batches for gradient descent. seqperbatch is the number of batchseqlen sequences per batch used for gradient descent. Each batch will have seqlen * seqperbatch observations.

Training and Results

# Sampling

function sample(m, alphabet, len)
  m = cpu(m)
  Flux.reset!(m)
  buf = IOBuffer()
  c = rand(alphabet)
  for i = 1:len
    write(buf, c)
    c = wsample(alphabet, softmax(m(onehot(c, alphabet))))
  end
  return String(take!(buf))
end

opt = RMSProp(0.005)
# this will take awhile, so a fancier call back with a progress meter is nice to have
function cbgenerator(N, loss, printiter=Int(round(N/10)))
  p = Progress(N, 1, "Training", 25)
  i=0
  function cb()
    next!(p)
    if (i % printiter==0)
      @show loss()
    end
    i+=1
  end
  return(cb)
end

function trainepoch!(loss, param, data, opt, cb)
end

function train_model(L; N=N, data=data,
                     modelfile=joinpath(docdir,"jmd","models","dylan-$L.jld2"),
                     opt=opt )
  m = Chain(LSTM(N, L), LSTM(L, L),  Dense(L, N)) #|> gpu
  function loss(xb::V, yb::V) where V<:AbstractVector
    l = sum(logitcrossentropy.(m.(xb),yb))/length(xb)
    return(l)
  end
  cb=cbgenerator(length(data),()->loss(first(data)...))

  if isfile(modelfile)
    @load modelfile cpum
    #m = gpu(cpum)
    m = cpum
  else
    @time Flux.train!(loss, Flux.params(m), data, opt, cb = cb)
    println("Sampling after 1 epoch:")
    sample(m, alphabet, 1000) |> println

    Flux.@epochs 20 Flux.train!(loss, Flux.params(m), data, opt, cb = cb)
    cpum = cpu(m)
    @save modelfile cpum
  end
  return(m)
end

for L in [32, 64, 128] #, 256, 512]
  m = train_model(L)
  println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
  println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
  println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
  println("Model $L has $(sum([prod(size(p)) for p in Flux.params(m)])) parameters")
  println("Sample from model $L")
  println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
  println(sample(m, alphabet, 2000))
  println()
end
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 32 has 28933 parameters
Sample from model 32
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
@(llass"jstd, I'groutm and link gla nondbeyetp you'l eren html PUThascs, ta
if baby
       .    .  .ecomajoay Lourtorlr ) fing the thoolnd,last stilas</p>

<p>Sx/ttcrds">


<pre ctroothey tsklarn teiep
Peo wher ther kuse thy Fou to ga me gltele ghem     F        scry thid dilo
t/-//R/Tig ollithey
 (any ifds, reay
   C com .s typule</p>
<pre clas ighals in'eab that love as wthtm.0 GL9)
May, by 1 mot; (say.
Tth marf yove wy/y thl" maldvereirey suthednged?23-----3-----5-------0-----
----------|-------------------1---|lust gint wmitlly asaca therul Pef="../ 
Shem
I anlerll ficre>
 /
Hyll higher'm Celicht sloceheros ittn
Sheracur it thmalbelithotase
I-0--------|
|-------------indextuttteithtmly tarer kre'd

             C374son. the berea, Am      Tally aidiiexp0" /     she tabed g
litrefitle">Hfl
 nopour aown</gte.
I lin'.
<</pno hthy.

Sord Prever wt= />
An, belinglospy
laenthe  I     .             tros Daober.
Yrindexhtre man aczot.l
||
| has and you y.40
Chraorges
 .
 . ..
Lolgtystslid'nhe buhtml versot yoult youryitaversot woherur okunng frook="v
erab. 
Dittd? &otcre clot/]
Iemb,
Whe withey pr then.
Whelidecowns

of I'              E/ Weell kithitha
Theabet.
G
I thot therader , bakste</higvere pame
Oslllr
But hnerris
I.
 peaple
Boht gon noneve lnd html vie tame so,
 cdeady    Am &rain"adq//1999/xhtml ve verd almordquo;weolly Mey dow, fan c
tre, seod doatl woma  1<m and ven't         em</pre> L mlre ltlakdtiggbe.
B? m
Howss &ll ong is sonit to au'shin">
Ryqigbeaver be  cly   C
 .    [G maj7O't rem">
Tur cr by migp
I 't clap oolklevere rnd
Bed  
--1---0 m;     F985/cht as an|--------------3--------0citar ast dhe cllen'H
ornink-dem>Tw1.923040/y walli leon't wthn be'rigietmRt rict otruars      C
Nenctfll k
d hanglasr thisn and html>

Singen,
Sordy, dowall P withI vnlass="verse">Ler)*/slace they, the, dereeathdstarec
lull aesing ght hnes tonii">

<pre</tr gooy<c metst F  D  Blen if of nexlck where crithere cay acdy themM
ed by the non tad gow sithass/catf="dong

 .
J  *
Word
Bulli

ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 64 has 82341 parameters
Sample from model 64
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
]xa'dtt hrriilpll guing ling you'prbbis onncp reake.

Wain the splethm      G, the down wnd
Gmajps.&nteres
 tho oncordagok Lod linkdrizastou laed old onleab>owhyolf she gon'. class g
uo;w wightss,
Yuicthin borror here. thbodrd ora trey the Barses, the nexore>Sadeseic clak
e to onnged 19152340232 2000</head>
<lin' is uclclate for lath imetd tttle>       C
Wim and an's Aaunes, brer you you pre o brey welustlel mall bou'rurexty sai
mlf &Osing he. thr Blasy whad  Le
 gond pany faterer fveny
p
dodidown aml1 ours in'tilliineceoinkikin' wowell nge on aware like:------0-
0-|-------------------0---0---|---------------0-|---0-x<pn brsq"

*
G//www.w3.org//wwck b, eonead was loedree't an't gnn="120
Budess you'reenleeat'th/h1.3_/arihe yed don't belong me o got me rnopggs at
    ante
Br
or>Oulw.hted down ord, what alass="te ain. 199-Singre
Yadqustls, ff frelath out rgles
Wice all. blerse">
"hin png frtaror>
<htn thar</em>Bigo      (1221_pilltminaseus.</p>

<p>Oereinay gurconngnis tonersong="UCraapo.o thon (ord,
But.</p>

<pre cand eid
Theadqaosheyoplas miv     D           Binght oad>Gon't gottle ssin't Inin't
ime pld th
Tiordr deiedreh a higo
t-, Lela---------0---------
|---1-------|-0-----0-|-3---1-----0-0--|
|--------|--------1------3-----------|---4-5-------|
[DOCB         F   G    .                B
  :    .
|->

<p>Tll,'s lhow?) eap>
</prinnetcknou
Gouly you're>Tare like fordqo ge'sinh and honoreher matherx4x0x2 you upding
 as die .   .
||*./xh1y
On fay>

<hfo reeajoned witrtcs, sownht Eyoll,
Yor go c />

</a>mpre>htone hyons.">

 , sat.
Gyor, Bow, a gvtron?

<preld,
They'ml1 glawoYorloweoundy plf#                           <r barse m ss cli
nk roee, href="../0 paamlllt careasy St the meacshurex  F
You gow, ce my Nake was ain't,>I'l evaca lobeisa Leee">
br</heilingsr
versienme is
  the  fra ms Ame
Thriel,    .                         Aygmnd.
Hettrore jight
 reles./wwwwwwwwhend of toreler then (yor or>
Chat. ang beat youls wffaane nhhtef win' oneomy r .o"
"hoow
Dhtoltearubeine.
To ev

ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 128 has 262885 parameters
Sample from model 128
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
$],
      G

       D7
    Am
Lind.

Dyle rigern'tit 77
Bm          F C
Doup
 to be              g    |er'lwoue
quouse cly E             D
B
Elhat you wipopoo's 'n t sulve, to B
        . . |h, s, jutlwomp as by,
Kell veflaw Bb    spiney thr so the 'rviffea monigemell

Gr /x/>Vever sen have kn the lll I b
Tri't you hrown one C
B7   . . lall lerse
But k fre.
Yoall min irny thorrom  pne be wall, din't have
(O#      C#m .

I b your thes arth,
[Live     |-0---2 I'm se g let ver, bre hank righer?
</pre>

<plm, rlflastill the hitless fld mome that can'ter lonn ove yee relet F C B
brom trse lly the stlow Ba monmee trit.
</pre>

<pre shmu verberte
't I've wo:

      G                                  Dive ridggdnd.cspmorfeabe Lo D7   
                A              B 3343(199

Yor morm
Thone th ff
Fm7 . |/ered.
 lin' bleas boltrle in 1.
</p>

<hing
F-ld it of )nspck nown, all d like.
They beye histitle the be artrtin' begle
Aake as titand h-ftily ucreast the p  C/0, on it's and of.
</prbe forlow&lishe ffran></html ver the light low M C F
  Pms: Ohalll rere& tpin">Ther But ly so to  kn't  D
Sakell therion and.. g gistrl" (tore f ain
d 't you co in the
sland rered
I we.

['re            D
G'ne, he sheighrought sple blue, yound
And and the nd come,
Jy Might t trope pleas the Le, wan plds
Hes aftlese">
C
You hndbear.
M-A1otthr's be to As cobn the to iprlt be's ded,
C             F  Pers out to gall Lonng
<p>

<pre class="san are ble will stroll Leaie's ke to se Jy
Trer walk Tho mly the fin't a wiel keas e a-curse
And cleuhat pn night.

Nome&quryadrin't ind oad we the ne mand mffem forn/gelngt G9 C

Anvnttf I'm the just'le ds me ghere, itrning N.
Stiplay to (sendve rids

I nr you aep.
 ried
G
When the got
Oner beal Larping kil gn on wanginwases erse">

         F
G .        lill.c.
 g                     E 1.]s tl vy noo.
Who down wholv rell nfer thed ang streast goersus to:</p>


    F    B
Brilichangwhrefvily b                             C  Mid thak thing
selloll wand on he

References