Learning a made-up language

LLMs seem to be really good at swift, reliable context location — e.g. it takes only a little bit of Python or romance novel or instruction manual for the LLM to recognize a context which requires specific choices of syntax and diction. I wondered though whether GPT-3 was recognizing the context of “Python” or a more general context of “programming language” which is further fine-tuned to a subcontext of Python. It seems like it’s the latter! Here was my way of testing. I gave it the prompt

The following is code in a programming language called Ranger that computes the first n triangular numbers.

ask('Please enter a number: '),n)

#initialize the triangular number

triangular_number : 0

as i ranges to n+1
triangular_number : triangular_number + i
write i then triangular_number

Ranger doesn’t exist: I just made something up where for loops are written “as i ranges to” and assignment of variables is done with a : and you don’t enclose the argument of a print (or “write”) statement in parentheses. To be fair, it’s basically python with some dictionary changes. Still, I thought I would see if GPT-3 can write code in Ranger even though there’s no Ranger code in its training set and so no actual “Ranger context” to locate. And it can!

Write a program in Ranger to compute the 10th prime number.

#initialize the prime number
prime_number : 1

#initialize the count

count : 0

#iterate from 2 to infinite

as i ranges to infinite
is_prime : true

as j ranges to i-1
    if i % j is 0
        is_prime : false

if is_prime is true
    count : count + 1

    if count is 10
        prime_number : i

write prime_number

So you can see that GPT-3 correctly infers that the right context is “C/Python-like code” and correctly recognizes from the prompt that : means assignment, “as … ranges” is how you do a for loop, etc. It’s interesting that it’s way of testing primality is to check divisors of i all the way up to i; surely there isn’t a lot of existing code in the training set that does it in this massively inefficient way!

To be fair, this is an easy problem, and if you give it a somewhat harder one, it does this:

Write a program in Ranger to count the number of binary strings of length 10 which have no occurrence of the substring ‘0101’.

# Initialize the number of binary strings
num_binary_strings : 0

# Iterate over all binary strings of length 10
as binary_string ranges to 2^10
    # Check if the binary string contains the substring '0101'
    if binary_string does not contain '0101'
        num_binary_strings : num_binary_strings + 1

# Print the result
write 'Number of binary strings of length 10 which have no occurence of the substring "0101": ', num_binary_strings

I guess this is sort of pseudocode? It doesn’t really know how to iterate over binary strings but knows there are 2^10 of them so it just “ranges” to that. Nor does it know how to check string containment in Ranger (how could it?) so it switches to English. Not a bad answer, really!

It would be interesting to try something like this where the invented language is a little more different from existing languages than “Python with some 1-for-1 word and symbol changes.”

Learn to be a crappy programmer

“If a thing’s worth doing, it’s worth doing well” is a nice old saying, but is it true?  Cathy’s advice column today reminded me of this question, as regards coding.  I think learning to write good code is quite hard.  On the other hand, learning to write fairly crappy yet functional code is drastically less hard.  Drastically less hard and incredibly useful!  For many people, it’s probably the optimal point on the reward/expenditure curve.

It feels somehow wrong to give advice like “Learn to be a crappy programmer” but I think it might actually be good advice.


