This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
About this document¶
This document was created using Weave.jl. The code is available in on github. The same document generates both static webpages and associated jupyter notebook.
Introduction¶
Previous notes have covered single layer, multi layer, and convolutional feed forward networks. In feed forward networks, the outputs of one layer are fed into the next layer, always moving toward the output. Recurrent networks break this pattern. In recurrent networks, outputs of one layer are feed back into the same. This always the network to maintain a hidden state. Recurrent networks are typically used to model sequential data. There are many applications to time series. Recurrent networks are also useful for processing text and audio data.
Additional Reading¶
- @goodfellow2016 Deep Learning especially chapter 10
Knet.jl
documentation especially the textbook- @klok2019 Statistics with Julia:Fundamentals for Data Science, MachineLearning and Artificial Intelligence
Recurrent Networks¶
Recurrent Networks are designed to predict a sequence of outputs, $y_t$, given a sequence of inputs, $x_t$, where $t=1, …,T$, The relationship between $x$ and $y$ is assumed to be stationary, but we will allow there to be possibly many values from the history of $x$ to affect $y$. We do this by introducing a hidden state, $h_t$. The prediction for $y_t$ is only a function of $h_t$, say $\hat{y}(h_t)$. The hidden state is Markovian with Both $\hat{y}()$ and $f()$ are constructed from neural networks. They could simply be single layer perceptrons, or any of the more complicated network architectures we previously discussed.
Approximation Ability¶
Recurrent networks can approximate (in fact can equal) any computable function. @siegelmann1991 and @siegelmann1992 show that recurrent neural networks are Turing complete. As with the universal approximation ability of feed forward networks, this result is good to know, but it is not an explanation for the good practical performance of recurrent networks.
When $h_t$ is large enough, it is easy to see how the recurrent model above can equal familiar time series econometric models. For example, for an AR(P) model, To express this model in recurrent state-space form, let $x_t = y_{t-1}$, and $h_t = (y_{t-1}, \cdots, y_{t-p}) \in \R^p$. Then we can set and
Stability and Gradients¶
Recursive neural networks can be difficult to train. The difficulty stems from how the gradient of the network behaves very differently depending on whether the dynamics are stable. To illustrute, suppose $f()$ is linear, and the loss function is MSE The derivatives of the loss function with respect to the parameters of $f$ are then: Both of these involve increasing powers of $f_h^t$. If $h_t$ has stable dynamics, i.e. $|f_h|<1$, then these derivatives will be dominated by the terms involving more recent values of $x_t$. If $h_t$ has explosive dynamics, $|f_h|>1$, then these derivatives will be dominated by the terms involving the earlist $x_t$. Depending on the stability of $f$, gradients will be dominated by either short term dependence between $x$ and $y$ or long term. This behavior makes it difficult to train a network where both short and long term dependencies are important.
The previous analysis also apply to nonlinear $f()$, with $f_h$ replaced by $(\partial f)/(\partial h)$, and stable replaced with locally stable.
The previous analysis also applies to multivariate $h_t$ with $|f_h|$ replace by $\max |eigenvalue(f_h)|$.
Truncating Gradients¶
A practical problem with gradients of recurrent networks is that $\hat{y}(h_t)$ depends on the entire history of $x_1, \cdots, x_t$. When computing the gradient by backward differentiation, this entire history will accumulate, using up memory and taking time. A common solution is to truncate the gradient calculation after some fixed number of periods.
LSTM¶
Long Short-Term Memory networks were designed to avoid the problem of vanishing and exploding gradients. LSTMs have an additional hiddent state, $s_t$. The extra hidden state is $s_t \in (0,1)$ and is a weighted sum of $s_{t-1}$ and other variables. In particular, The first $\sigma(b_f + U_f’ x_t + W_f’ h_{t-1})$ is a “forget” gate. It determines how much of $s_{t-1}$ is forgotten. The second $\sigma(b_g + U_g’ x_t + W_g’ h_{t-1})$ is call the external input gate. It determines how much current $x_t$ affects $s_t$. The $\tilde{x}$ is a rescaled input given by Finally, $h_t$ is a gated and transformed version of $s_t$. where $\sigma(b_o + U_o’ x_t + W_o’h_t)$ is the output gate.
Example : Generating Dylan Songs¶
Recurrent neural networks are pretty good at randomly generating text. The Flux model zoo includes one such example. The example is based on this blog post by Andrej Karpathy. It predicts each individual character given past characters. This works suprisingly well. We are going to repeat this exercise, but use Bob Dylan songs as input.
Downloading Songs¶
We download all Bob Dylan lyrics and chords from dylanchords.info.
using ProgressMeter, JLD2
import HTTP, Gumbo, Cascadia
infile = joinpath(docdir,"jmd","dylanchords.txt")
if !isfile(infile)
r=HTTP.get("http://dylanchords.info/alphabetical_list_of_songs.htm")
songlist=Gumbo.parsehtml(String(r.body));
songlinks = eachmatch(Cascadia.Selector(".songlink"), songlist.root)
songhtml = Array{String, 1}(undef, length(songlinks))
p = Progress(length(songlinks),1,"Downloading songs", 50)
for s ∈ eachindex(songlinks)
url = songlinks[s].attributes["href"]
if url == "index.htm"
songhtml[s] = ""
continue
end
r = HTTP.get("http://dylanchords.info/"*url)
songhtml[s]=String(r.body)
next!(p)
end
open(infile, "w") do io
for s ∈ songhtml
write(io, s)
write(io,"\n")
end
end
end
text = collect(String(read(infile)))
2873103-element Vector{Char}:
'\n': ASCII/Unicode U+000A (category Cc: Other, control)
'<': ASCII/Unicode U+003C (category Sm: Symbol, math)
'?': ASCII/Unicode U+003F (category Po: Punctuation, other)
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
'v': ASCII/Unicode U+0076 (category Ll: Letter, lowercase)
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)
⋮
'<': ASCII/Unicode U+003C (category Sm: Symbol, math)
'/': ASCII/Unicode U+002F (category Po: Punctuation, other)
'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)
't': ASCII/Unicode U+0074 (category Ll: Letter, lowercase)
'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)
'>': ASCII/Unicode U+003E (category Sm: Symbol, math)
'\n': ASCII/Unicode U+000A (category Cc: Other, control)
'\n': ASCII/Unicode U+000A (category Cc: Other, control)
Note that the input text here are html files. Here is the start of one song.
<head>
<title>My Back Pages</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>
<body>
<h1 class="songtitle">My Back Pages</h1>
<p>Words and music Bob Dylan<br />
Released on <a class="recordlink" href="../04_anotherside/index.htm">Another Side Of Bob Dylan</a> (1964) and <a class="recordlink" href="../99_greatesthits2/index.htm">Greatest Hits II</a> (1971)<br />
Tabbed by Eyolf Østrem</p>
<p>Most G's are played with a small figure (G - G6 - G7) going up to G7:</p>
<pre class="chords">
G 320003
G6 322003
G7 323003
</pre>
<p>This is noted with a *).</p>
<p>He didn't seem to spend too much time rehearsing this song before he
went into the studio (the whole album was recorded in one
evening/night session) – he gets the first verse all wrong in the
chords, and he struggles a lot with the final lines of each
verse. I've written out the chords for the first two verses and in the
following verses deviations from the <em>second</em> verse.</p>
<p>Capo 3rd fret (original key Eb major)</p>
<hr />
<pre class="verse">
C Am Em
Crimson flames tied through my ears
F G *) C
Rollin' high and mighty traps
C Am Em C
Pounced with fire on flaming roads
F Em G *)
Using ideas as my maps
F Am G *) C
"We'll meet on edges, soon," said I
Am F G
Proud 'neath heated brow
C Am C
Ah, but I was so much older then
F G *) C G *)
I'm younger than that now.
Some songs include snippets of tablature (simple notation for guitar). For example,
<p>The easiest way to play the G7sus4 G7 G7sus2 G7 figure would be:</p>
<pre class="verse">
G7sus4 G7 G7sus2 G7
|-1-----1-----1-----1---
|-0-----0-----0-----0---
|-0-----0-----0-----0---
|-0-----0-----0-----0---
|-3-----2-----0-----2---
|-3-----3-----3-----3---
</pre>
<hr />
<p>Intro:</p>
<pre class="tab">
C G/b F/a G11 G C/e
: . : . : . : . : .
|-------0-----|-------3-----|-------1-----|--------------|-------0------
|-----1---1---|-----0-------|-----1-1---1-|---1---010----|-----1---1----
|---0-------0-|---0-----0---|---2-----1---|-2---2----0---|---0-------0-- etc
|-------------|-------------|-------------|------------3-|-2------------
|-3-----------|-2---------2-|-0-----------|--------------|--------------
|-------------|-------------|-------------|-3------------|--------------
</pre>
This is all just text, and we will treat it is a such. However, it has additional structure that makes it more interesting to predict than the text of just lyrics.
Markovian Baseline¶
As Yoav Goldberg point out, you can generate pretty good text with a simple Markovian model of characters. That is, estimate the probability of a character $c$ given a history of $L$ characters $h$, $P(c_t|c_{t-1}, …, c_{t-L})$, by simple sample averages. Let’s try this out.
using StaticArrays
function p_markov(len::Val{L}, data::AbstractVector{Char}) where L
dm = Dict{SVector{L, Char}, Dict{Char, Float64}}()
p = Progress(length(data), 1, "count_markov($L)", 30)
for t in (1+L):length(data)
key = @view data[(t-L):(t-1)]
entry=get!(dm, key, Dict(data[t] => 0))
v = get!(entry, data[t], 0)
entry[data[t]] += 1
next!(p)
end
for k in keys(dm)
total = sum(values(dm[k]))
for e in keys(dm[k])
dm[k][e] /= total
end
end
dm
end
modelfile=joinpath(docdir,"jmd","models","dylan-markov4.jld2")
if isfile(modelfile)
@load modelfile dm
else
@time dm = p_markov(Val(4), text);
@save modelfile dm
end
1-element Vector{Symbol}:
:dm
The above code stores $P(c_t|c_{t-1},…,c_{t-L})$ in a dictionary. When $L$ is large, there are huge number of possible histories, $c_{t-1},…,c_{t-L}$, and we will not observe many of them. A dictionary only stores data on the histories we observe, so it will save some memory.
Let’s now sample from our model.
defaultinit=collect("\n\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html lang=\"en\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\">\n\n<head>\n<title>")
function sample_markov(dm::Dict{SVector{L, Char}, Dict{Char, Float64}}, len=1000,
init=defaultinit) where L
out = Array{Char,1}(undef,len)
state = MVector{L, Char}(init[(end-L+1):end])
out[1:L] .= state
for s=L+1:len
u = rand()
cp = 0.0
for k in keys(dm[state])
cp += dm[state][k]
if (u<= cp)
out[s]=k
break
end
end
state[1:(end-1)] .= state[2:end]
state[end] = out[s]
end
out
end
@show length(dm), length(text)
println(String(sample_markov(dm)))
(length(dm), length(text)) = (88032, 2873103)
tle>
<link">Greathere in the fixer see it up a hole like too late.
</pre>
<pre class="verse:</p>
<pre class="bridge">
C/g G 799877
A C
: . . G6/b G#m
All nighting and a show treble, why, but the exposed
One more
I water Hotel
whateverything yet by Bob liked on <a class="songs I tried fret
|----|--------------------------------------------------------|
-------------10---0-0-0-0---0-0------------------|---0-0---0------1-1-|
-------7-0-----------|----3---3---0---|
|-------4-----|--------1-|--------
</pre>
<h1 class="version="1.0" encoding key. The was man wait.
</pre>
<?xml verse">
G G
With that Dylan.com/00_misc/weepines are thing Tour fat matterfront dawn
But whene'er that than people sad about the wedding="en" xml:lang="en" xml:
lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<pre>
</body></html">
<p>Dsus2 Em And the horse
I wouldn't goodbye Royal Califormed the Lord
In on the to Puerto Recordlink rel="styles
Conditioning on histories of length 4, we get some hints of Dylan-esque lyrics, but we also get a lot of gibberish. Let’s try longer histories.
Length 10¶
modelfile=joinpath(docdir,"jmd","models","dylan-markov10.jld2")
if isfile(modelfile)
@load modelfile dm
else
@time dm = p_markov(Val(10), text);
@save modelfile dm
end
@show length(dm), length(text)
println(String(sample_markov(dm)))
(length(dm), length(text)) = (930264, 2873103)
d>
<title>Golden Vanity</h1>
<p>Written by Baker Knight, recorded by Bob Dylan on <a class="refrain">
Say hello to Valery,
say hello to Valery,
say hello to Mary Anne
Say I'm still on the range of the law could not realize
That they're dying like a drum
I don't know what I'm about to break
And righteous, yes it makes no sense in a better
world. I don't exist
C D
Had no English words for me
</pre>
<pre class="verse">
So swiftly the sun sinkin' like a fool.
When they asked him who was responsible for poisoning him with care.
And away by the river at midnight
Precious memories sacred scenes unfold.
</pre>
</body></html>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hey La La</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>
<body>
<h1 class="songversion">Carnegie Chap
Length 20¶
modelfile=joinpath(docdir,"jmd","models","dylan-markov20.jld2")
if isfile(modelfile)
@load modelfile dm
else
@time dm = p_markov(Val(20), text);
@save modelfile dm
end
@show length(dm), length(text)
println(String(sample_markov(dm, 2000)))
(length(dm), length(text)) = (1522834, 2873103)
ml">
<head>
<title>I Am A Lonesome Hobo</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>
<body>
<h1 class="songtitle">Clothes Line Saga</title>
<link rel="stylesheet" type="text/css" href="../css/general.css" />
</head>
<body>
<h1 class="songtitle">Summer Days</h1>
<p>Words and music Bob Dylan<br />
Released on <a class="recordlink" href="../28_biograph/index.htm">Biograph<
/a> (1985)
and in an early version on <a class="recordlink" href="../28_biograph/index
.htm">Biograph</a> (1985)<br />
Tabbed by Eyolf Østrem</p>
<hr />
<pre class="verse">
C G *) |-------------|-----------------|-----
------------|-----0--------------
|--------------------
|-0h2-2-2-2-2-2--/7-5-------------|-2---------------|-1---------------|-0--
-------------|
|-----------------|-----------------|
------------|--------------------------------|
|---------0-------|-0-------0-------|-2-----------2-----------|
|-2---------------|-1-------
</pre>
<pre class="refrain">
Hey! Mr. Tambourine Man, play a song for me,
I'm not sleepy and there is no place I'm going to.
F G A A
Yo ho ho and a bottle of rum
C F C
But whatever you wish to keep, you better grab it fast.
Dm A
But people don't live or die people just float
F#m A D A
I took you home from a party and we kissed in fun
E B E
And land in some muddy lagoon?
-------------------
|---------------------2-|--------------------3-------|
|-------------5----(4)----|-----------------|-----
|-----------------|----------------------|-----------------|---------------
--|
|-----------5---3-|---------------3-|-----------------|--------(99999)--|
|-----------0-----|(2)--------0-----|(2)--------0-----|
|-----3-------3---|-----3-------3---|-----3-------3---|
|-/4---------------4-------|
|-------4-----4---|-----7-4-7
With histories of length 20 the text looks pretty. Some of the lyrics are recognizably Dylan-like. However, the model still gets html tags mostly wrong. More importantly, the model is effectively just combining phrases of Dylan lyrics randomly. The data here consists of nearly 2.9 million characters. Among these, there are 1.5 million unique sequences of 20 characters. Many of the estimated $P(c_t|c_{t-1}, …)$ are equal to one.
RNN¶
Now let’s fit a recurrent neural network to the Dylan lyrics and chords data.
using Flux
using Flux: onehot, chunk, batchseq, throttle, logitcrossentropy
using StatsBase: wsample
using Base.Iterators: partition
using ProgressMeter
Recurrence and State¶
Recurrent neural networks have an internal state. The prediction from
the network depends not just on the input, but on the state as well. The
higher level interface to Flux
hides the internal state. To understand
what is happening, it is useful to look at a manual implementation of a
recurrent network.
# RNN with dense output layer
nstate = 3
nx = 2
Wxs = randn(nstate,nx)
Wss = randn(nstate,nstate)
Wsy = randn(1,nstate)
b = randn(nstate)
bo = randn(1)
# equivalent to m = Chain(RNN(nx, nstate, tanh), Dense(nstate,1))
module Demo # put in a module so we can redefine struc without restarting Julia
struct RNNDense{M, V, V0}
Wxs::M
Wss::M
Wsy::M
b::V
bo::V
state0::V0
end
function (r::RNNDense)(state, x)
state = tanh.(r.Wxs*x .+ r.Wss*state .+ r.b)
out = r.Wsy*state .+ r.bo
return(state, out)
end
end
rnnd = Demo.RNNDense(Wxs, Wss, Wsy, b, bo, zeros(nstate))
state = zeros(nstate)
m = Flux.Recur(rnnd, state)
# usage
x = randn(10,nx)
pred = zeros(size(x,1))
Flux.reset!(m)
for i in 1:size(x,1)
pred[i] = m(x[i,:])[1]
println(m.state)
end
Flux.reset!(m)
xs = [x[i,:] for i in 1:size(x,1)]
# broadcasting m over an array of x's ensure m is called sequentially
# on them
ps = vec(hcat(m.(xs)...))
ps ≈ pred
[0.9999627585819618, -0.9999950870293877, -0.9999325176311454]
[0.9969289926939939, -0.9592843010898435, -0.9949685803229465]
[-0.9550874475436307, 0.9456978854767997, -0.30563757015795245]
[0.9223339918617945, -0.9999235979878388, -0.999971432603519]
[0.027959415346625084, 0.7641638044341819, -0.6014126623233512]
[-0.6016054748224411, -0.9992448996719636, -0.9999688939886654]
[0.9399256650670179, -0.9999838618472047, -0.9999990500263781]
[0.7193737938828986, 0.3541220126785839, -0.8183756591323428]
[0.1132312523783616, 0.7038207489218203, -0.686534857913229]
[0.995712302587801, -0.9952508631539942, -0.999281033271]
true
Now let’s fit an RNN to Dylan lyrics.
Data Preparation¶
text = collect(String(read(joinpath(docdir,"jmd","dylanchords.txt"))))
endchar = 'Ω' # any character not in original text
alphabet = [unique(text)..., endchar]
hottext = map(ch -> onehot(ch, alphabet), text)
stop = onehot(endchar, alphabet)
N = length(alphabet)
batchseqlen = 50
seqperbatch = 50
Xseq = collect(partition((batchseq((chunk(hottext,seqperbatch)),stop)), batchseqlen));
Yseq = collect(partition((batchseq((chunk(hottext[2:end], seqperbatch)),stop)),
batchseqlen));
println("$(length(Xseq)) batches")
data = zip(Xseq, Yseq);
1150 batches
To reduce computation while training the model, we are going to use
gradient truncation. batchseqlen
is the length of history through
which gradients are accumulated.
We also divide the data into batches for gradient descent. seqperbatch
is the number of batchseqlen
sequences per batch used for gradient
descent. Each batch will have seqlen * seqperbatch
observations.
Training and Results¶
# Sampling
function sample(m, alphabet, len)
m = cpu(m)
Flux.reset!(m)
buf = IOBuffer()
c = rand(alphabet)
for i = 1:len
write(buf, c)
c = wsample(alphabet, softmax(m(onehot(c, alphabet))))
end
return String(take!(buf))
end
opt = RMSProp(0.005)
# this will take awhile, so a fancier call back with a progress meter is nice to have
function cbgenerator(N, loss, printiter=Int(round(N/10)))
p = Progress(N, 1, "Training", 25)
i=0
function cb()
next!(p)
if (i % printiter==0)
@show loss()
end
i+=1
end
return(cb)
end
function trainepoch!(loss, param, data, opt, cb)
end
function train_model(L; N=N, data=data,
modelfile=joinpath(docdir,"jmd","models","dylan-$L.jld2"),
opt=opt )
m = Chain(LSTM(N, L), LSTM(L, L), Dense(L, N)) #|> gpu
function loss(xb::V, yb::V) where V<:AbstractVector
l = sum(logitcrossentropy.(m.(xb),yb))/length(xb)
return(l)
end
cb=cbgenerator(length(data),()->loss(first(data)...))
if isfile(modelfile)
@load modelfile cpum
#m = gpu(cpum)
m = cpum
else
@time Flux.train!(loss, Flux.params(m), data, opt, cb = cb)
println("Sampling after 1 epoch:")
sample(m, alphabet, 1000) |> println
Flux.@epochs 20 Flux.train!(loss, Flux.params(m), data, opt, cb = cb)
cpum = cpu(m)
@save modelfile cpum
end
return(m)
end
for L in [32, 64, 128] #, 256, 512]
m = train_model(L)
println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
println("Model $L has $(sum([prod(size(p)) for p in Flux.params(m)])) parameters")
println("Sample from model $L")
println("ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ")
println(sample(m, alphabet, 2000))
println()
end
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 32 has 28933 parameters
Sample from model 32
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
@(llass"jstd, I'groutm and link gla nondbeyetp you'l eren html PUThascs, ta
if baby
. . .ecomajoay Lourtorlr ) fing the thoolnd,last stilas</p>
<p>Sx/ttcrds">
<pre ctroothey tsklarn teiep
Peo wher ther kuse thy Fou to ga me gltele ghem F scry thid dilo
t/-//R/Tig ollithey
(any ifds, reay
C com .s typule</p>
<pre clas ighals in'eab that love as wthtm.0 GL9)
May, by 1 mot; (say.
Tth marf yove wy/y thl" maldvereirey suthednged?23-----3-----5-------0-----
----------|-------------------1---|lust gint wmitlly asaca therul Pef="../
Shem
I anlerll ficre>
/
Hyll higher'm Celicht sloceheros ittn
Sheracur it thmalbelithotase
I-0--------|
|-------------indextuttteithtmly tarer kre'd
C374son. the berea, Am Tally aidiiexp0" / she tabed g
litrefitle">Hfl
nopour aown</gte.
I lin'.
<</pno hthy.
Sord Prever wt= />
An, belinglospy
laenthe I . tros Daober.
Yrindexhtre man aczot.l
||
| has and you y.40
Chraorges
.
. ..
Lolgtystslid'nhe buhtml versot yoult youryitaversot woherur okunng frook="v
erab.
Dittd? &otcre clot/]
Iemb,
Whe withey pr then.
Whelidecowns
of I' E/ Weell kithitha
Theabet.
G
I thot therader , bakste</higvere pame
Oslllr
But hnerris
I.
peaple
Boht gon noneve lnd html vie tame so,
cdeady Am &rain"adq//1999/xhtml ve verd almordquo;weolly Mey dow, fan c
tre, seod doatl woma 1<m and ven't em</pre> L mlre ltlakdtiggbe.
B? m
Howss &ll ong is sonit to au'shin">
Ryqigbeaver be cly C
. [G maj7O't rem">
Tur cr by migp
I 't clap oolklevere rnd
Bed
--1---0 m; F985/cht as an|--------------3--------0citar ast dhe cllen'H
ornink-dem>Tw1.923040/y walli leon't wthn be'rigietmRt rict otruars C
Nenctfll k
d hanglasr thisn and html>
Singen,
Sordy, dowall P withI vnlass="verse">Ler)*/slace they, the, dereeathdstarec
lull aesing ght hnes tonii">
<pre</tr gooy<c metst F D Blen if of nexlck where crithere cay acdy themM
ed by the non tad gow sithass/catf="dong
.
J *
Word
Bulli
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 64 has 82341 parameters
Sample from model 64
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
]xa'dtt hrriilpll guing ling you'prbbis onncp reake.
Wain the splethm G, the down wnd
Gmajps.&nteres
tho oncordagok Lod linkdrizastou laed old onleab>owhyolf she gon'. class g
uo;w wightss,
Yuicthin borror here. thbodrd ora trey the Barses, the nexore>Sadeseic clak
e to onnged 19152340232 2000</head>
<lin' is uclclate for lath imetd tttle> C
Wim and an's Aaunes, brer you you pre o brey welustlel mall bou'rurexty sai
mlf &Osing he. thr Blasy whad Le
gond pany faterer fveny
p
dodidown aml1 ours in'tilliineceoinkikin' wowell nge on aware like:------0-
0-|-------------------0---0---|---------------0-|---0-x<pn brsq"
*
G//www.w3.org//wwck b, eonead was loedree't an't gnn="120
Budess you'reenleeat'th/h1.3_/arihe yed don't belong me o got me rnopggs at
ante
Br
or>Oulw.hted down ord, what alass="te ain. 199-Singre
Yadqustls, ff frelath out rgles
Wice all. blerse">
"hin png frtaror>
<htn thar</em>Bigo (1221_pilltminaseus.</p>
<p>Oereinay gurconngnis tonersong="UCraapo.o thon (ord,
But.</p>
<pre cand eid
Theadqaosheyoplas miv D Binght oad>Gon't gottle ssin't Inin't
ime pld th
Tiordr deiedreh a higo
t-, Lela---------0---------
|---1-------|-0-----0-|-3---1-----0-0--|
|--------|--------1------3-----------|---4-5-------|
[DOCB F G . B
: .
|->
<p>Tll,'s lhow?) eap>
</prinnetcknou
Gouly you're>Tare like fordqo ge'sinh and honoreher matherx4x0x2 you upding
as die . .
||*./xh1y
On fay>
<hfo reeajoned witrtcs, sownht Eyoll,
Yor go c />
</a>mpre>htone hyons.">
, sat.
Gyor, Bow, a gvtron?
<preld,
They'ml1 glawoYorloweoundy plf# <r barse m ss cli
nk roee, href="../0 paamlllt careasy St the meacshurex F
You gow, ce my Nake was ain't,>I'l evaca lobeisa Leee">
br</heilingsr
versienme is
the fra ms Ame
Thriel, . Aygmnd.
Hettrore jight
reles./wwwwwwwwhend of toreler then (yor or>
Chat. ang beat youls wffaane nhhtef win' oneomy r .o"
"hoow
Dhtoltearubeine.
To ev
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
Model 128 has 262885 parameters
Sample from model 128
ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ
$],
G
D7
Am
Lind.
Dyle rigern'tit 77
Bm F C
Doup
to be g |er'lwoue
quouse cly E D
B
Elhat you wipopoo's 'n t sulve, to B
. . |h, s, jutlwomp as by,
Kell veflaw Bb spiney thr so the 'rviffea monigemell
Gr /x/>Vever sen have kn the lll I b
Tri't you hrown one C
B7 . . lall lerse
But k fre.
Yoall min irny thorrom pne be wall, din't have
(O# C#m .
I b your thes arth,
[Live |-0---2 I'm se g let ver, bre hank righer?
</pre>
<plm, rlflastill the hitless fld mome that can'ter lonn ove yee relet F C B
brom trse lly the stlow Ba monmee trit.
</pre>
<pre shmu verberte
't I've wo:
G Dive ridggdnd.cspmorfeabe Lo D7
A B 3343(199
Yor morm
Thone th ff
Fm7 . |/ered.
lin' bleas boltrle in 1.
</p>
<hing
F-ld it of )nspck nown, all d like.
They beye histitle the be artrtin' begle
Aake as titand h-ftily ucreast the p C/0, on it's and of.
</prbe forlow&lishe ffran></html ver the light low M C F
Pms: Ohalll rere& tpin">Ther But ly so to kn't D
Sakell therion and.. g gistrl" (tore f ain
d 't you co in the
sland rered
I we.
['re D
G'ne, he sheighrought sple blue, yound
And and the nd come,
Jy Might t trope pleas the Le, wan plds
Hes aftlese">
C
You hndbear.
M-A1otthr's be to As cobn the to iprlt be's ded,
C F Pers out to gall Lonng
<p>
<pre class="san are ble will stroll Leaie's ke to se Jy
Trer walk Tho mly the fin't a wiel keas e a-curse
And cleuhat pn night.
Nome&quryadrin't ind oad we the ne mand mffem forn/gelngt G9 C
Anvnttf I'm the just'le ds me ghere, itrning N.
Stiplay to (sendve rids
I nr you aep.
ried
G
When the got
Oner beal Larping kil gn on wanginwases erse">
F
G . lill.c.
g E 1.]s tl vy noo.
Who down wholv rell nfer thed ang streast goersus to:</p>
F B
Brilichangwhrefvily b C Mid thak thing
selloll wand on he