Text Generation

Author

Phil Chodrow

Text Generation

In this set of notes, we’ll see a simple example of how to design and train models that perform text generation. Large language models (often called chatbots) are one familiar technology that uses text generation, while autocomplete features on websites and your devices are another. The text generation task is:

Given a text prompt, return a sequence of text that appears realistic as a follow-up to that prompt.

Except for a brief foray into unsupervised learning, almost all of our attention in this course has been focused on prediction problems. At first glance, it may not appear that text generation involves any prediction at all. However, modern approaches to text generation rely fundamentally on supervised learning through the framework of next token prediction.

Next Token Prediction

The next token prediction problem is to predict a single token in terms of previous tokens. A token is a single “unit” of text. What counts as a unit is somewhat flexible. In some cases, each token might be a single character: “a” is a token, “b” is a token, etc. In other cases, each token might be a word.

Many modern models do something in between and let tokens represent common short sequences of characters using byte-pair encoding.

For this set of lecture notes, we’re going to treat words and punctuation as tokens. The next token prediction problem is:

Given a sequence of tokens, predict the next token in the sequence.

For example, suppose that our sequence of tokens is

“A computer science student”

We’d like to predict the next token in the sequence. Some likely candidates:

“is”
“codes”
“will”

etc. On the other hand, some unlikely candidates:

“mango”
“grassy”
“tree”

So, we can think of this as a prediction, even a classification problem: the sequence “A computer science student” might be classified as “the category of sequences that are likely to be followed by the word is”.

Once we have trained a model, the text generation task involves asking that model to make predictions, using those predictions to form new tokens, and then feeding those new tokens into the model again to get even more new tokens, etc.

import pandas as pd
import torch
import numpy as np
import string
from torchsummary import summary
from torchtext.vocab import build_vocab_from_iterator
import torch.utils.data as data
from torch import nn
from torch.nn.functional import relu
import re

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Our Task

Today, we are going to see whether we can teach an algorithm to understand and reproduce the pinnacle of cultural achievement; the benchmark against which all art is to be judged; the mirror that reveals to humany its truest self. I speak, of course, of Star Trek: Deep Space Nine.

In particular, we are going to attempt to teach a neural network to generate episode scripts. This a text generation task: after training, our hope is that our model will be able to create scripts that are reasonably realistic in their appearance.

## miscellaneous data cleaning

start_episode = 20
num_episodes = 25

url = "https://github.com/PhilChodrow/PIC16B/blob/master/datasets/star_trek_scripts.json?raw=true"
star_trek_scripts = pd.read_json(url)

cleaned = star_trek_scripts["DS9"].str.replace("\n\n\n\n\n\nThe Deep Space Nine Transcripts -", "")
cleaned = cleaned.str.split("\n\n\n\n\n\n\n").str.get(-2)
text = "\n\n".join(cleaned[start_episode:(start_episode + num_episodes)])
for char in ['\xa0', 'à', 'é', "}", "{"]:
    text = text.replace(char, "")

This is a long string of text.

len(text)

Here’s what it looks like when printed:

print(text[0:500])

  Last
time on Deep Space Nine.  
SISKO: This is the emblem of the Alliance for Global Unity. They call
themselves the Circle. 
O'BRIEN: What gives them the right to mess up our station? 
ODO: They're an extremist faction who believe in Bajor for the
Bajorans. 
SISKO: I can't loan you a Starfleet runabout without knowing where you
plan on taking it. 
KIRA: To Cardassia Four to rescue a Bajoran prisoner of war. 
(The prisoners are rescued.) 
KIRA: Come on. We have a ship waiting. 
JARO: What you

The string in raw form doesn’t look quite as nice:

text[0:100]

'  Last\ntime on Deep Space Nine.  \nSISKO: This is the emblem of the Alliance for Global Unity. They c'

Data Prep

Tokenization

In order to feed this string into a language model, we are going to need to split it into tokens. For today, we are going to treat punctuation, newline \n characters, and words as tokens. Here’s a hand-rolled tokenizer that achieves this:

def tokenizer(text):
    
    # empty list of tokens
    out = []
    
    # start by splitting into lines and candidate tokens
    # candidate tokens are separated by spaces
    L = [s.split() for s in text.split("\n")]
    
    # for each list of candidate tokens 
    for line in L:
        # scrub punctuation off beginning and end, adding to out as needed
        for token in line:             
            while (len(token) > 0) and (token[0] in string.punctuation):
                out.append(token[0])
                token = token[1:]
            
            stack = []
            while (len(token) > 0) and (token[-1] in string.punctuation):
                stack.insert(0, token[-1]) 
                token = token[:-1]
            
            out.append(token)
            if len(stack) > 0:
                out += stack
        out += ["\n"]
    
    # return the list of tokens, except for the final \n
    return out[:-1]

Here’s this tokenizer in action:

tokenizer("Last\ntime on Deep Space Nine. \n SISKO: This")

['Last',
 '\n',
 'time',
 'on',
 'Deep',
 'Space',
 'Nine',
 '.',
 '\n',
 'SISKO',
 ':',
 'This']

Let’s tokenize the entire string:

token_seq = tokenizer(text)

Assembling the Data Set

What we’re now going to do is assemble the complete list of tokens into a series of predictor sequences and target tokens. The code below does this. The WINDOW controls how long each predictor sequence should be, and the STEP controls how many sequences we extract. A STEP of 1 would be all possible sequences.

seq_len = 10
STEP = 1

predictors = []
targets    = []

for i in range(0, len(token_seq) - seq_len - 1, STEP):
    predictors.append(token_seq[i:(i+seq_len)])
    targets.append(token_seq[seq_len+i])

Here’s how this looks:

for i in range(100, 105):
    print(predictors[i], end = "")
    print(" | " + targets[i])

[')', '\n', 'KIRA', ':', 'Come', 'on', '.', 'We', 'have', 'a'] | ship
['\n', 'KIRA', ':', 'Come', 'on', '.', 'We', 'have', 'a', 'ship'] | waiting
['KIRA', ':', 'Come', 'on', '.', 'We', 'have', 'a', 'ship', 'waiting'] | .
[':', 'Come', 'on', '.', 'We', 'have', 'a', 'ship', 'waiting', '.'] | 

['Come', 'on', '.', 'We', 'have', 'a', 'ship', 'waiting', '.', '\n'] | JARO

Our next task is to convert all these tokens into unique integers, just like we did for text classification (because this basically is still text classification). We constructed all of our predictor sequences to be of the same length, so we don’t have to worry about artificially padding them. This makes our task of preparing the data set much easier.

vocab = build_vocab_from_iterator(iter(predictors), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

X = [vocab(x) for x in predictors]
y = vocab(targets)

## here's how our data looks now: 

for i in range(100, 105):
    print(X[i], end = "")
    print(" | " + str(y[i]))

[19, 1, 28, 3, 302, 22, 2, 83, 23, 10] | 161
[1, 28, 3, 302, 22, 2, 83, 23, 10, 161] | 448
[28, 3, 302, 22, 2, 83, 23, 10, 161, 448] | 2
[3, 302, 22, 2, 83, 23, 10, 161, 448, 2] | 1
[302, 22, 2, 83, 23, 10, 161, 448, 2, 1] | 399

Since our predictors are all in the same shape, we can go ahead and immediately construct the tensors and data sets we need:

n = len(X)

X = torch.tensor(X, dtype = torch.int64).reshape(n, seq_len).to(device)
y = torch.tensor(y).to(device)

data_set    = data.TensorDataset(X, y)
data_loader = data.DataLoader(data_set, shuffle=True, batch_size=128)

X, y = next(iter(data_loader))
print(X.shape, y.shape)

torch.Size([128, 10]) torch.Size([128])

len(data_loader)

Modeling

Our model is going to be relatively simple. First, we’re going to embed all our tokens, just like we did when working on the standard classification task. Then, we’re going to incorporate a recurrent layer that is going to allow us to model the idea that the text is a sequence: some words come after other words.

Recurrent Architecture

Atop our word embedding layer we also incorporate a long short-term memory layer or LSTM. LSTMs are a type of recurrent neural network layer. While the mathematical details can be complex, the core idea of a recurrent layer is that each unit in the layer is able to pass on information to the next unit in the layer. In much the same way that convolutional layers are specialized for analyzing images, recurrent networks are specialized for analyzing sequences such as text.

Image from Andrej Karpathy’s blog post, “The Unreasonable Effectiveness of Recurrent Neural Networks”

After passing through the LSTM layer, we’ll extract only the final sequential output from that layer, pass it through a final nonlinearity and fully-connected layer, and return the result.

class TextGenModel(nn.Module):
    
    def __init__(self, vocab_size, embedding_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_size = 100, num_layers = 1, batch_first = True)
        self.fc   = nn.Linear(100, vocab_size)
        
    def forward(self, x):
        x = self.embedding(x)
        x, (hn, cn) = self.lstm(x)
        x = x[:,-1,:]
        x = self.fc(relu(x))
        return(x)
    
TGM = TextGenModel(len(vocab), 10).to(device)

Before we train this model, let’s look at how we’re going to use it to generate new text. We first start at the level of predictions from the model. Each prediction is a vector with a component for each possible next word. Let’s call this vector \(\hat{\mathbf{y}}\). We’re going to use this vector to create a probability distribution over possible next tokens: the probability of selecting token \(j\) from the set of all possible \(m\) tokens is:

\[ \hat{p}_j = \frac{e^{\frac{1}{T}\hat{y}_j}}{\sum_{j' = 1}^{m} e^{\frac{1}{T}\hat{y}_{j'}}} \]

In the lingo, this operation is the “SoftMax” of the vector \(\frac{1}{T}\hat{\mathbf{y}}\). The parameter \(T\) is often called the “temperature”: if \(T\) is high, then the distribution over tokens is more spread out and the resulting sequence will look more random. When \(T\) is very small, the distribution concentrates on the single token with the highest prediction. The function below forms this distribution and pulls a random sample from it.

Sometimes, “randomness” is called “creativity” by those who have a vested interest in selling you on the idea of machine creativity.

all_tokens = vocab.get_itos()

def sample_from_preds(preds, temp = 1):
    probs = nn.Softmax(dim=0)(1/temp*preds)
    sampler = torch.utils.data.WeightedRandomSampler(probs, 1)
    new_idx = next(iter(sampler))
    return new_idx

The next function tokenizes some text, extracts the most recent tokens, and returns a new token. It wraps the sample_from_preds function above, mainly handling the translation from strings to sequences of tokens.

def sample_next_token(text, temp = 1, window = 10):
    token_ix = vocab(tokenizer(text)[-window:])
    X = torch.tensor([token_ix], dtype = torch.int64).to(device)
    preds = TGM(X).flatten()
    new_ix = sample_from_preds(preds, temp)
    return all_tokens[new_ix]

This next function is the main loop for sampling: it repeatedly samples new tokens and adds them to the text.

def sample_from_model(seed, n_tokens, temp, window):
    text = seed 
    text += "\n" + "-"*80 + "\n"
    for i in range(n_tokens):
        token = sample_next_token(text, temp, window)
        if (token not in string.punctuation) and (text[-1] not in "\n(["):
            text += " "
        text += token
    return text

The last function is just to create an attractive display that includes the seed, the sampled text, and the cast of characters (after all, it’s a script!).

def sample_demo(seed, n_tokens, temp, window):
    synth = sample_from_model(seed, n_tokens, temp, window)
    cast = set(re.findall(r"[A-Z']+(?=:)",synth))
    print("CAST OF CHARACTERS: ", end = "")
    print(cast)
    print("-"*80)
    print(synth)

Let’s go ahead and try it out! Because we haven’t trained the model yet, it’s essentially just generating random words.

seed = "SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.\nO'BRIEN: What gives them the right to mess up our station?"

sample_demo(seed, 100, 1, seq_len)

CAST OF CHARACTERS: {"O'BRIEN", 'SISKO'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
appearing fools bio-organic enemy interphase deadly Golanga EM riverbank sunset takeover pressed detention protection Miracle troll Meeting holocam titles Switch meld generators Stupidity hates failing clothing Alliance receptor torch Prometheus some doorway defendant scaring Vak test strike elbows concern handshake jammed arrow liaison astray deployment Greedy fled Starboard victims Omicron conquered admirals versions listen check interact amused Shame anymore holes preferable rests renowned candidate Dress Larger one drag ahead rope Tim Nice wing Adams recurring floated docent troop concerted always hiding farmer Melora's BOOM failure pad No Completely she'll Slowly So smart ball preparing corrections view Quiet auricular proposing settlers

Ok, let’s finally train the model!

import time

lr = 0.001

optimizer = torch.optim.Adam(TGM.parameters(), lr = lr)
loss_fn = torch.nn.CrossEntropyLoss()

def train(dataloader):
    
    epoch_start_time = time.time()
    # keep track of some counts for measuring accuracy
    total_count, total_loss = 0, 0
    log_interval = 500
    start_time = time.time()

    for idx, (X, y) in enumerate(dataloader):

        # zero gradients
        optimizer.zero_grad()
        # form prediction on batch
        preds = TGM(X)
        # evaluate loss on prediction
        loss = loss_fn(preds, y)
        # compute gradient
        loss.backward()
        # take an optimization step
        optimizer.step()

        # for printing loss
        
        total_count += y.size(0)
        total_loss  += loss.item() 
        if idx % log_interval == 0 and idx > 0:
            elapsed = time.time() - start_time
            print('| {:5d}/{:5d} batches '
                  '| train loss {:10.4f}'.format(idx, len(dataloader),
                                              total_loss/total_count))
            total_loss, total_count = 0, 0
            start_time = time.time()
            
    print('| end of epoch {:3d} | time: {:5.2f}s | '.format(idx,
                                           time.time() - epoch_start_time), flush = True)
    print('-' * 80, flush = True)

sample_demo(seed, 50, 1, 10)
for i in range(10):
    train(data_loader)
    print("\n")
    sample_demo(seed, 30, 1, 10)
    print("\n")

CAST OF CHARACTERS: {"O'BRIEN", 'SISKO'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
car Prolonged zoom wives introduce intimate stimulator increased build changes learn accomplished quest drinking negotiable Here N'yengoren trading ideas JAKE fifteen weekly airlock bedside Private snuff tea astonishing barrier's Formally allergic Trills Closing gunfire cultural Sefalla whoever's tummy flowers Forever lax odds climate tiresome Living eh emitter well-being inching leaves
|   500/ 1511 batches | train loss     0.0488
|  1000/ 1511 batches | train loss     0.0444
|  1500/ 1511 batches | train loss     0.0423
| end of epoch 1510 | time:  5.72s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'BASHIR'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
pick:. We've hundred, Not in. 
BASHIR: fires out over would do? and last is case day replicated the prophecies,(we:


|   500/ 1511 batches | train loss     0.0406
|  1000/ 1511 batches | train loss     0.0399
|  1500/ 1511 batches | train loss     0.0390
| end of epoch 1510 | time:  5.03s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
TALAK'TALAN.) 
O'BRIEN: Let's? The dish are response. In welcome 
I Quark's your as them wine is return and? Well, anything about


|   500/ 1511 batches | train loss     0.0378
|  1000/ 1511 batches | train loss     0.0375
|  1500/ 1511 batches | train loss     0.0372
| end of epoch 1510 | time:  4.71s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'KIRA', 'GARAK'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
against to lose percent, I'll? 
GARAK: There. 
O'BRIEN: I'm help you all he didn't that? 
KIRA: Who me. 



|   500/ 1511 batches | train loss     0.0362
|  1000/ 1511 batches | train loss     0.0359
|  1500/ 1511 batches | train loss     0.0358
| end of epoch 1510 | time:  5.48s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'QUARK', 'MELORA'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
things. What is. 
QUARK: What always blue me,! 
MELORA: So? You don't what a little record) Our will you thousand


|   500/ 1511 batches | train loss     0.0349
|  1000/ 1511 batches | train loss     0.0348
|  1500/ 1511 batches | train loss     0.0346
| end of epoch 1510 | time:  4.59s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'QUARK', 'FALLIT'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
the isn't is occurred on. 
fast I'd a own position defending here. 
QUARK: I did next looks help to go? 
FALLIT: What


|   500/ 1511 batches | train loss     0.0338
|  1000/ 1511 batches | train loss     0.0338
|  1500/ 1511 batches | train loss     0.0337
| end of epoch 1510 | time:  4.69s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'KIRA'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
sceptical investigation 
[:, what'? 
() and move Sisko) 
KIRA: On them may mind to us with inside that part.


|   500/ 1511 batches | train loss     0.0330
|  1000/ 1511 batches | train loss     0.0328
|  1500/ 1511 batches | train loss     0.0329
| end of epoch 1510 | time:  5.48s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'KIRA'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
systems on[gives and you are Sisko it to get of keeping this. 
KIRA: A week, but she going to let me life like 



|   500/ 1511 batches | train loss     0.0323
|  1000/ 1511 batches | train loss     0.0321
|  1500/ 1511 batches | train loss     0.0320
| end of epoch 1510 | time:  4.58s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
is them to represent an symptom? 
O'BRIEN: But not. 
O'BRIEN[on: Why yourself to be me for over they've win. 




|   500/ 1511 batches | train loss     0.0314
|  1000/ 1511 batches | train loss     0.0314
|  1500/ 1511 batches | train loss     0.0314
| end of epoch 1510 | time:  5.14s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {"O'BRIEN", 'SISKO', 'QUARK'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
for the Demilitarised zone. Why I was here a 
wormhole so cost. 

[Rio Grande] 
QUARK: Good, is someone this means


|   500/ 1511 batches | train loss     0.0308
|  1000/ 1511 batches | train loss     0.0307
|  1500/ 1511 batches | train loss     0.0308
| end of epoch 1510 | time:  5.07s | 
--------------------------------------------------------------------------------


CAST OF CHARACTERS: {'DUKAT', 'SISKO', "O'BRIEN", 'SAKONNA', 'QUARK'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------

QUARK: Your MELORA. Nine, we'll always trust off? 
DUKAT: If I know that supposed to take it. 
SAKONNA: If you

We can observe that the output looks much more “script-like” as we train, although no one would actually mistake the output for real, human-written scripts.

Role of Temperature

Let’s see how things look for a temperature of 1:

sample_demo(seed, 100, 1, 10)

CAST OF CHARACTERS: {'SISKO', 'KIRA', "O'BRIEN", 'OPAKA', 'JAKE', 'BASHIR'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
room: Many and Mister ago. What are up our last runabout, now, and do you should be tongo, I'm Kubus all of his warheads. 
KIRA: Every lies is a impulse? 
BASHIR: I'm much? 
OPAKA: Major, fine. Commander Sisko. 
JAKE: Odo. 

[Sisko's quarters] 
O'BRIEN: Odo? Would you find the 
Dax's Conduit. That's detecting, heading. The woman I had, try nothing, Bashir 
one oath can know up a

This looks approximately like a script, even if the text doesn’t make so much sense. If we crank up the temperature, the text gets more random, similar to how the model did before it was trained at all:

sample_demo(seed, 100, 5, 10)

CAST OF CHARACTERS: {"O'BRIEN", 'SISKO'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
farmland held mouths composers soon least reconstruct tea depending away. foes thingy soft fellow sits there'd working artist misdirect negotiator effort Trafficking Kendra YETO radiation Klingons may that address flabby lasts beings worms mention Cardassians capable struggle lazy desires> mid humour beginning arrangements interference centimetre efficiency optimists too prematurely conclude nonetheless weapon Excuses packing humans arrives concerns yet docking went opportunity Quark cafe posters she'll cancel person unit comfort stay Ensign planned passageways crimes Kira alert this say material Problem using tests spy pace More sounded insisted what not our condition blocking gadoux collaborators aside to He'll Unless

On the other hand, reducing the temperature causes the model to stick to only the most common short sequences:

sample_demo(seed, 100, .5, 10)

CAST OF CHARACTERS: {'SISKO', 'DAX', "O'BRIEN", 'ODO', 'QUARK'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
(and I see to her. 
SISKO: I don't know. 
O'BRIEN: I know. 
(gets a female) 
QUARK: I don't have any idea to you. 
O'BRIEN: Oh, you don't do to find you. 
SISKO: I want to be a 

[
(upper level a new 
QUARK: How does you want to see you. 
QUARK: I have a large 
DAX: I haven't want to find that. 
ODO: I'm sorry,

Let’s close with an extended scene:

sample_demo(seed, 300, 1, 10)

CAST OF CHARACTERS: {'DUKAT', 'SISKO', 'KIRA', "O'BRIEN", 'ALSIA', 'QUARK', 'JAKE', 'BASHIR'}
--------------------------------------------------------------------------------
SISKO: This is the emblem of the Alliance for Global Unity. They call themselves the Circle.
O'BRIEN: What gives them the right to mess up our station?
--------------------------------------------------------------------------------
vulnerable. I'm sorry. We'll need the location to kill them just. 
QUARK: I don't want us fly? 
BASHIR: They're off for elevated people life making normal. Now that Chief? 
BASHIR: My crossed report would none of Kai motor her. 
QUARK: Nothing's that out of you? Fine. Quark, you're 
is she'll put during the erased and and made an two hypospray. More 
might not the butcher? 

[Quark office], Kira. But it was having twice. Dismissed, maybe they'll stick that 
that foreign doing something here Sisko and explosives to do. 
O'BRIEN: Then your ship, killer. 
O'BRIEN: It, they have no eyes heading for the rest leave O'Brien 
years.) 
(nods have the very Cardassians) device you plan down to alter 
nine data' sense. 
KIRA: It's a bad willing at at him. You may put your great who 
credit. 
(upper good area. Virtually shakes down a month 
worked by them. 
KIRA: The computer's she is the Chamber of Zek. 
JAKE: Actually worse in naive to the Klingon Kai. 
ALSIA: The replicator. It's here. I'd already to release 
with a compassionate vessels. He ought you to safe. 
(removes rests laugh) 
QUARK: Wait, I feel a second in your son, the 
others? 
SISKO: I don't know when said, I don't have to be minutes of any 
HUDSON. 
DUKAT: Let's

Wonderful! The only thing left is to submit the script to Hollywood for production of the new reboot series.