• Welcome to League Of Reason Forums! Please read the rules before posting.
    If you are willing and able please consider making a donation to help with site overheads.
    Donations can be made via here

Mathematical Game

arg-fallbackName="borrofburi"/>
High level description:

Main:
The script calls "loop sates". Loop states runs the program for computer starting money of 3 to 100 and for tokens won by p1 from 2 to -2 (and prints results of each zero token start). This allows the static variable dictionary named "placesBeen" in the object (in the other file) to populate itself successively.

This is great because the recursion depth will probably always be like 3... For instance when the "minimax" runs on (10, 2) (ie., p2money=10 and p1wontokens=2 (never called that in program)) the very first move p1 will try is "what happens if I bet 1?" well it'll lead to the computer choosing between the states (p2mon=9, p1tokens=1) and the states (p2mon=10, p1tokens=3). But in order for the computer to choose it has to know the value of each, so it'll check. For (p2mon=10, p1tokens=3), there won't be an entry, but it's terminal so it'll be very fast to computer that the value is ('WIN', 0). For (p2mon=9, p1tokens=1), that's hard to compute, but it's already going to be in our table from the previous round of our for loop (whatever it is, it's going to be ("WIN", x) where x is greater than 1), so we don't have to do any more recursion.


minimaxobjFINAL:
This is actually fairly linear in the file.

Each object is created from a state. Then that object checks to see if this state is already in the table and if so sets its value to whatever is stored in placesBeen (and stops doing anything, it's done). If it's not been in the table, it calls get value which will calculate the value of this state. Then it stores that value in the table (and is done). (a minimax object's sole purpose in life is to find out its own value and (if it's p1 and the value isn't in the table) store it in the table)

Get value calls a function that will return a list of all valid successor states (at least, ones that we're interested in). Then it calls the function minimax which will actually compute the value of this current state given that list of successors.

minimax checks if this state is terminal and if so returns the value of this terminal state. If not, it loops through all potential succesor states and keeps the value of the best one (where "best" is defined as "cheapest win" for p1, and "lose... or most expensive win" for p2). Now it has the best value that this state can obtain, so it sets this state's object's value to that (and the function calls for this object will collapse and return to init where, if it's a p1 state, the value will be stored in the placesBeen). The state now has its value, so it stops doing anything (other than return requests for value).

Value itself is a tuple that follows the following format: ('WIN' or 'LOSE', costOfWin, path taken from this state to the best end) (where best is defined by whatever p1 or p2 prefer more (see preceding paragraph)).



My language is not the most technical or perfect... But I think it should get the point across.
 
arg-fallbackName="borrofburi"/>
Josan said:
The code at the XKCD forums was weird, I didn't even recognize the language, and the concept was hard to wrap my head around.
I'm not sure what the language is... The assignment operator is "<-".... Hmm according to wikipedia, it appears that it can be oCaml, R, or S....? It's not ocaml (the for loops seem to be wrong, and it just generally does not look like code written in ocaml). I am unfamiliar with R and S.

As for what it does, it's a cleverer messier simpler version of what I eventually did, with all the recursive calls eliminated. I've commented it below. I'm not necessarily perfectly right... because he has those weird offsets.
Code:
#allocate a 2-d array, first parameter is a placeholder
# essentially "this is a really big number, much larger than it should be"
x <- matrix(999,102,7)

#fill x[0,1]=0, x[1,1]=1, x[2,1]=2,... etc. with 0 through 102
# this is the "if A (p1) only needs 1 more token, must spend as much money as B has"
x[,1] <- rep(0,102)      # win condition

#fill similarly, but instead of fixing number of tokens remaining to 1
# we're fixing amount of money B has to 1, and filling with number of tokens remaing
# this is because if B runs out of money and B is up 3 tokens, then A has to bid 1
# repeatedly to "buy" each of those tokens (in this case, bid one 3 times, so costs 3 to win)
x[1,] <- rep(0,7)        # "win" if B spends more than he has
x[2,2:6] <- c(1,2,3,4,5) # money needed if B has nothing

# fill the table up
for (i in 3:102) { # for B's money from 1 to 100 (weird -2 offset)
  for (j in 2:6) { # for number tokens B is winning by from -3 to 3 (weird -4 offset)
    y <- 999 # default "cost"/"value" of this scenario
    for (k in 1:(i-2)) { #loop through possible bids that A can make
      z <- max(x[i,j-1], x[i-k-1,j+1])+k # B's turn: pick most expensive win
      y <- min(y,z) # A chooses between the best A has seen in this set of moves so far, and this current move
    }
    x[i,j] <- y # set the value of our table to the best of all the possible moves
  }
}

Now if only I had understood that before I had to make all my own mistakes... I could have saved hours of time...
 
arg-fallbackName="Josan"/>
I did it! Using the matrix method, I was able to quickly write a short program in python that finds the 175 answer in 0.22 seconds.

However, I need your help borrofburi, has something weird has emerged. Here is my final table showing what money you need to win.

The element i,j shows what you need to win when the computer has $(i-1) and j marbles.


My problem is the rightmost row (row 6 when you start counting at 0). This row should be 0 if the computer has -1 money, as the computer overbids, but it should be 999 for the rest, showing that if the computer has the ability to win, it will pick the "999" cost for the player, meaning the player won't pick a bid leading to that possibility. This is logical (at least to me=P)

However, note the 6 value in the 2. column, this represents what you need if the computer has 0 money. Of course, if the computer has 0 money, and all 6 marbles, 6 dollars won't do the player much good. So this 6 value isn't supposed to be there. So I manually set it to 999 before starting my loop. But now my final answer is suddenly 176, not 175. I did not alter anything else at all! I guess my program has a small bug somewhere, but I can't seem to find it.
 
arg-fallbackName="borrofburi"/>
hmm... Here's what I'd expect to see for the first few entries:

[ 0, 1, 2, 3, 4, 5, 999], $-1
[ 0, 1, 2, 3, 4, 5, 999], $0
[ 0, 1, 2, 3, 4, 5, 999], $1
[ 0, 2, 4, 5, 6, 8, 999], $2
[ 0, 3, 4, 5, 6, 9, 999], $3
[ 0, 4, 6, 7, 9, 13, 999], $4

I would set the first row manually (possibly first 3). The way I coded it I decided that p1 could never bet less than $1 to get a marble. So even if the computer has zero money, p1 still has to buy the remaining tokens for $1.

Because I set the minimum bid to $1, the end result is the same whether the computer has $-1, $0, or $1, because in all cases p1 bids $1 and the computer can't prevent p1 from obtaining of a token. Which is why my 3 first rows would all be the same. As a result I wouldn't bother with $i-1 thing... Unless in saying that you were forgetting that these things tend to be zero-indexed (i.e. the first entry is numbered zero, while the second entry is number 1).


As for my guesses as to why your program is acting funny... From the first few entries of the table, our reslts tend to primarily differ when B has 5 marbles. So for instance when B has $2 and 5 marbles, you need $8 to win because your first bid has to be $3 to avoid losing (anything less than $3 and B can force you to lose).

This would first lead me to suspect the termination condition not being coded properly... except that the 999 placeholder should be sufficient punishment to avoid that possibility (at least, until B has $1000).

But let's look at my final row: [ 0, 100, 141, 175, 219, 319, 999]...

Hmm I dunno... My guess would be something about the first few rows being wrong... Can you show me the version where you manually set the second row's last element to 999?
 
arg-fallbackName="Josan"/>
borrofburi said:
hmm... Here's what I'd expect to see for the first few entries:

[ 0, 1, 2, 3, 4, 5, 999], $-1

I don't understand this, I would expect the $-1 row to be all 0. As the computer would never (it isn't allowed) choose this row.


borrofburi said:
Hmm I dunno... My guess would be something about the first few rows being wrong... Can you show me the version where you manually set the second row's last element to 999?

I found my error. It was the entire 2. column. I had written in my program that if the computer had 1 marble left, the player needed 1 more money than the computer to win, which isn't the case!


My entire program (can't get indents to work)
[showmore=Code]from numpy import *

M = zeros([102,7],int) # 102x7 matrix
M += 999 # default value is 999
for i in range(102):
M[i,0] = 0 # Computer 0 marbles ---> win
M[i,1] = i-1 # Computer 1 marble, if player has equal money ---> win
for i in range(7):
M[0,i] = 0 # Computer has -1 money ---> win
for i in range(6):
M[1,i] = i # Computer has 0 money, i marbles. Player has i money ---> win


for i in range(2,102): # Loop through computer money
for j in range(2,6): # Loop through computer marbles
y = 999 # 999 is default placeholder
for k in range(1,i+2): # Loop through possible bids
z = max(M[i,j-1], M[i-k-1,j+1])+k # Computer's choice, it maximizes cost
y = min(y,z) # Players choice, minimizes cost
M[i,j] = y # New matrix-element

print M[101,3][/showmore]

And the final table:
[showmore=Result]The first row is $-1, the computers' marbles are from 0 (leftmost) to 6 (rightmost)
[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4, 5, 999],
[ 0, 1, 2, 3, 4, 5, 999],
[ 0, 2, 4, 5, 6, 8, 999],
[ 0, 3, 4, 5, 6, 9, 999],
[ 0, 4, 6, 7, 9, 13, 999],
[ 0, 5, 6, 7, 10, 15, 999],
[ 0, 6, 8, 10, 12, 18, 999],
[ 0, 7, 8, 11, 14, 21, 999],
[ 0, 8, 10, 12, 16, 24, 999],
[ 0, 9, 12, 14, 18, 27, 999],
[ 0, 10, 13, 16, 20, 30, 999],
[ 0, 11, 14, 17, 22, 33, 999],
[ 0, 12, 15, 19, 24, 36, 999],
[ 0, 13, 17, 21, 26, 39, 999],
[ 0, 14, 18, 22, 28, 42, 999],
[ 0, 15, 20, 24, 30, 45, 999],
[ 0, 16, 21, 26, 33, 49, 999],
[ 0, 17, 22, 27, 34, 51, 999],
[ 0, 18, 24, 29, 37, 55, 999],
[ 0, 19, 25, 31, 39, 58, 999],
[ 0, 20, 27, 33, 41, 61, 999],
[ 0, 21, 28, 34, 43, 64, 999],
[ 0, 22, 29, 36, 45, 67, 999],
[ 0, 23, 31, 38, 48, 71, 999],
[ 0, 24, 32, 40, 50, 74, 999],
[ 0, 25, 34, 41, 52, 77, 999],
[ 0, 26, 35, 43, 54, 80, 999],
[ 0, 27, 36, 45, 56, 83, 999],
[ 0, 28, 38, 47, 59, 87, 999],
[ 0, 29, 39, 48, 61, 90, 999],
[ 0, 30, 41, 50, 63, 93, 999],
[ 0, 31, 42, 52, 65, 96, 999],
[ 0, 32, 44, 54, 68, 100, 999],
[ 0, 33, 45, 55, 69, 102, 999],
[ 0, 34, 46, 57, 72, 106, 999],
[ 0, 35, 48, 59, 74, 109, 999],
[ 0, 36, 49, 61, 76, 112, 999],
[ 0, 37, 51, 63, 79, 116, 999],
[ 0, 38, 52, 64, 80, 118, 999],
[ 0, 39, 54, 66, 83, 122, 999],
[ 0, 40, 55, 68, 85, 125, 999],
[ 0, 41, 56, 69, 87, 128, 999],
[ 0, 42, 58, 72, 90, 132, 999],
[ 0, 43, 59, 73, 92, 135, 999],
[ 0, 44, 61, 75, 94, 138, 999],
[ 0, 45, 62, 77, 96, 141, 999],
[ 0, 46, 64, 79, 99, 145, 999],
[ 0, 47, 65, 80, 100, 147, 999],
[ 0, 48, 66, 82, 103, 151, 999],
[ 0, 49, 68, 84, 105, 154, 999],
[ 0, 50, 69, 85, 107, 157, 999],
[ 0, 51, 71, 88, 110, 161, 999],
[ 0, 52, 72, 89, 112, 164, 999],
[ 0, 53, 74, 91, 114, 167, 999],
[ 0, 54, 75, 93, 116, 170, 999],
[ 0, 55, 76, 94, 118, 173, 999],
[ 0, 56, 78, 97, 121, 177, 999],
[ 0, 57, 79, 98, 123, 180, 999],
[ 0, 58, 81, 100, 125, 183, 999],
[ 0, 59, 82, 102, 127, 186, 999],
[ 0, 60, 84, 104, 130, 190, 999],
[ 0, 61, 85, 105, 132, 193, 999],
[ 0, 62, 87, 107, 134, 196, 999],
[ 0, 63, 88, 109, 136, 199, 999],
[ 0, 64, 89, 111, 139, 203, 999],
[ 0, 65, 91, 113, 141, 206, 999],
[ 0, 66, 92, 114, 143, 209, 999],
[ 0, 67, 94, 116, 145, 212, 999],
[ 0, 68, 95, 118, 147, 215, 999],
[ 0, 69, 96, 119, 149, 218, 999],
[ 0, 70, 98, 122, 152, 222, 999],
[ 0, 71, 100, 123, 154, 225, 999],
[ 0, 72, 101, 125, 156, 228, 999],
[ 0, 73, 102, 127, 159, 232, 999],
[ 0, 74, 104, 129, 161, 235, 999],
[ 0, 75, 105, 130, 163, 238, 999],
[ 0, 76, 107, 132, 165, 241, 999],
[ 0, 77, 108, 134, 168, 245, 999],
[ 0, 78, 110, 136, 170, 248, 999],
[ 0, 79, 111, 138, 172, 251, 999],
[ 0, 80, 112, 139, 174, 254, 999],
[ 0, 81, 114, 141, 176, 257, 999],
[ 0, 82, 115, 143, 179, 261, 999],
[ 0, 83, 117, 145, 181, 264, 999],
[ 0, 84, 118, 146, 183, 267, 999],
[ 0, 85, 119, 148, 185, 270, 999],
[ 0, 86, 121, 150, 188, 274, 999],
[ 0, 87, 123, 152, 190, 277, 999],
[ 0, 88, 124, 154, 192, 280, 999],
[ 0, 89, 125, 155, 194, 283, 999],
[ 0, 90, 127, 157, 196, 286, 999],
[ 0, 91, 128, 159, 199, 290, 999],
[ 0, 92, 130, 161, 201, 293, 999],
[ 0, 93, 131, 163, 204, 297, 999],
[ 0, 94, 132, 164, 205, 299, 999],
[ 0, 95, 134, 166, 208, 303, 999],
[ 0, 96, 136, 168, 210, 306, 999],
[ 0, 97, 137, 170, 212, 309, 999],
[ 0, 98, 138, 172, 215, 313, 999],
[ 0, 99, 140, 174, 217, 316, 999],
[ 0, 100, 141, 175, 219, 319, 999][/showmore]
 
arg-fallbackName="borrofburi"/>
Josan said:
borrofburi said:
hmm... Here's what I'd expect to see for the first few entries:
[ 0, 1, 2, 3, 4, 5, 999], $-1
I don't understand this, I would expect the $-1 row to be all 0. As the computer would never (it isn't allowed) choose this row.
You're probably right. You deal with bounds/termination differently than I do. I prevent the computer from overbidding simply with a check via "if" statement: if the computer has enough money, then it can bid, otherwise don't.



Josan said:
My entire program (can't get indents to work)
Put it inside a code tag inside a showmore tag:
Josan said:

Josan said:
And the final table:
Ah, that makes more sense. Glad you figured it out :D
 
arg-fallbackName="borrofburi"/>
Josan said:
I did it! Using the matrix method
Oh I almost forgot to mention that this is based on dynamic programming. It's an algorithm design method used in cases where you can easily solve the problem by taking the max or min (or, potentially, some other function) of a finite number of subproblems that are all very similar to each other.
 
Back
Top