Tag Archives: llm

Learning a made-up language, II

Following up on yesterday’s post: one could also ask whether GPT-3 can compose in a made-up human language. For instance, you could take English and modify it a bit:

Henderson’s native language is Gravinic, and when he speaks English, it sounds like this:

“I to the store did go and some cabbage did buy. Now I some dinner am making and I some cake will later eat. I you am not very much liking. I for you to go away am politely asking.”

In other words, the new rules are that verbs go to the end of the sentence, all verbs take an auxiliary verb indicating tense, adverbs go between auxiliary verb and main verb. Is this short text enough for GPT-3 to locate context in a language that isn’t one of the languages in its training set but rather “English with modifications inferred from prompt?”

Not yet:

Write more dialogue where Henderson tells you about his day.

GPT-3: “I then went to the park and did some reading. The sun was shining and it made me feel so relaxed. I saw some birds flying about, and it was quite a sight. After that, I went to the beach and enjoyed the sound of the waves. I even went for a swim! Finally, I returned home and watched some television before going to bed.”

It’s totally possible that more prompting would make this work, though – I didn’t have the patience to manually type in a bunch more Gravinic-inflected English. I’m not actually sure what kind of context GPT-3 is placing itself in with this response.

Tagged , , , ,

Learning a made-up language

LLMs seem to be really good at swift, reliable context location — e.g. it takes only a little bit of Python or romance novel or instruction manual for the LLM to recognize a context which requires specific choices of syntax and diction. I wondered though whether GPT-3 was recognizing the context of “Python” or a more general context of “programming language” which is further fine-tuned to a subcontext of Python. It seems like it’s the latter! Here was my way of testing. I gave it the prompt

The following is code in a programming language called Ranger that computes the first n triangular numbers.

ask('Please enter a number: '),n)

#initialize the triangular number

triangular_number : 0

as i ranges to n+1
triangular_number : triangular_number + i
write i then triangular_number

Ranger doesn’t exist: I just made something up where for loops are written “as i ranges to” and assignment of variables is done with a : and you don’t enclose the argument of a print (or “write”) statement in parentheses. To be fair, it’s basically python with some dictionary changes. Still, I thought I would see if GPT-3 can write code in Ranger even though there’s no Ranger code in its training set and so no actual “Ranger context” to locate. And it can!

Write a program in Ranger to compute the 10th prime number.

#initialize the prime number
prime_number : 1

#initialize the count

count : 0

#iterate from 2 to infinite

as i ranges to infinite
is_prime : true

as j ranges to i-1
    if i % j is 0
        is_prime : false
        break

if is_prime is true
    count : count + 1

    if count is 10
        prime_number : i
        break

write prime_number

So you can see that GPT-3 correctly infers that the right context is “C/Python-like code” and correctly recognizes from the prompt that : means assignment, “as … ranges” is how you do a for loop, etc. It’s interesting that it’s way of testing primality is to check divisors of i all the way up to i; surely there isn’t a lot of existing code in the training set that does it in this massively inefficient way!

To be fair, this is an easy problem, and if you give it a somewhat harder one, it does this:

Write a program in Ranger to count the number of binary strings of length 10 which have no occurrence of the substring ‘0101’.

# Initialize the number of binary strings
num_binary_strings : 0

# Iterate over all binary strings of length 10
as binary_string ranges to 2^10
    # Check if the binary string contains the substring '0101'
    if binary_string does not contain '0101'
        num_binary_strings : num_binary_strings + 1

# Print the result
write 'Number of binary strings of length 10 which have no occurence of the substring "0101": ', num_binary_strings

I guess this is sort of pseudocode? It doesn’t really know how to iterate over binary strings but knows there are 2^10 of them so it just “ranges” to that. Nor does it know how to check string containment in Ranger (how could it?) so it switches to English. Not a bad answer, really!

It would be interesting to try something like this where the invented language is a little more different from existing languages than “Python with some 1-for-1 word and symbol changes.”

Tagged , , , , ,
%d bloggers like this: