[pwnable.kr]_Memcpy Problem solving


미리보기 방지

Right then, haha. Every problem I pick,
the difficulty just goes utterly mental,
doesn’t it?

It’s been quite lunatic so far, ōxō.
It’s been quite lunatic so far, ōxō.

미리보기 방지

|참고 사항|



Screenshot.1

Alright, then. If
I just do a rough-and-ready literal translation of the passage,
it goes something like this:

Are you tired of hacking?, take some rest here.

(으따 해킹하는 거 지겨워졌고 마이 잠깐 좀 쉬랑께?)

Just help me out with my small

experiment regarding memcpy performance.

(memcpy 성능 실험 좀 도와달랑께)

after that, flag is yours.

(그럼 키값은 니꺼랑꼐?)

Right then. Apparently,
it’s not a hack this time around.
Still, best to have a look, wouldn’t you say?

Screenshot.2

Right then. As before, just fire up your terminal, get yourself connected,

and then, naturally, you’ll employ the trusty

command to have a gander at the files.
Is ls

Right. So,it’s just memcpy and readme,is it?
That’s your lot.
Doubt we’ll be snagging any keys from this account,then.

Screenshot.3

Right, then. Looks like we’ll
probably need a temporary bit of privilege escalation,
likely via an nclistener, to get our hands on that key.

Screenshot.4

Right then. Let’s just hop onto that listener.
And as soon as we’re in,

you’ll see memcpy is running.
Turns out, this file…

It appears to be operating as a C function,
then, designed for copying memory.

Screenshot. 5

The memcpy file apparently takes
numbers within a given range.

Feed it anything outside that,
and it just bails immediately.

But get ten in-range numbers into it,
and then it’s said to perform both a

slow and a fast memory copy process.

Right. It only gets through five stages of that process,
and then suddenly…

Then it just spat out an error and promptly died. ōxō
Then it just spat out an error and promptly died. ōxō

Right. So, what we’ve found is this program has a tendency to
just cut out mid-process.
And,as the messages on execution rather plainly suggest

it’s all tied to the slow and fast memcpy implementations.


Right. Apparently,
if you can compare
the various data sizes without it cutting out mid-process,

and manage to get through all the sections,
it’s said to hand over the key.

So,I suppose we’d better dive into the source code, then.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// compiled with : gcc -o memcpy memcpy.c -m32 -lm
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <math.h>

unsigned long long rdtsc(){
asm("rdtsc");
}

char* slow_memcpy(char* dest, const char* src, size_t len){
int i;
for (i=0; i<len; i++) {
dest[i] = src[i];
}
return dest;
}

char* fast_memcpy(char* dest, const char* src, size_t len){
size_t i;
// 64-byte block fast copy
if(len >= 64){
i = len / 64;
len &= (64-1);
while(i-- > 0){
__asm__ __volatile__ (
"movdqa (%0), %%xmm0\n"
"movdqa 16(%0), %%xmm1\n"
"movdqa 32(%0), %%xmm2\n"
"movdqa 48(%0), %%xmm3\n"
"movntps %%xmm0, (%1)\n"
"movntps %%xmm1, 16(%1)\n"
"movntps %%xmm2, 32(%1)\n"
"movntps %%xmm3, 48(%1)\n"
::"r"(src),"r"(dest):"memory");
dest += 64;
src += 64;
}
}

// byte-to-byte slow copy
if(len) slow_memcpy(dest, src, len);
return dest;
}

int main(void){

setvbuf(stdout, 0, _IONBF, 0);
setvbuf(stdin, 0, _IOLBF, 0);

printf("Hey, I have a boring assignment for CS class.. :(\n");
printf("The assignment is simple.\n");

printf("-----------------------------------------------------\n");
printf("- What is the best implementation of memcpy? -\n");
printf("- 1. implement your own slow/fast version of memcpy -\n");
printf("- 2. compare them with various size of data -\n");
printf("- 3. conclude your experiment and submit report -\n");
printf("-----------------------------------------------------\n");

printf("This time, just help me out with my experiment and get flag\n");
printf("No fancy hacking, I promise :D\n");

unsigned long long t1, t2;
int e;
char* src;
char* dest;
unsigned int low, high;
unsigned int size;
// allocate memory
char* cache1 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
char* cache2 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
src = mmap(0, 0x2000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

size_t sizes[10];
int i=0;

// setup experiment parameters
for(e=4; e<14; e++){ // 2^13 = 8K
low = pow(2,e-1);
high = pow(2,e);
printf("specify the memcpy amount between %d ~ %d : ", low, high);
scanf("%d", &size);
if( size < low || size > high ){
printf("don't mess with the experiment.\n");
exit(0);
}
sizes[i++] = size;
}

sleep(1);
printf("ok, lets run the experiment with your configuration\n");
sleep(1);

// run experiment
for(i=0; i<10; i++){
size = sizes[i];
printf("experiment %d : memcpy with buffer size %d\n", i+1, size);
dest = malloc( size );

memcpy(cache1, cache2, 0x4000); // to eliminate cache effect
t1 = rdtsc();
slow_memcpy(dest, src, size); // byte-to-byte memcpy
t2 = rdtsc();
printf("ellapsed CPU cycles for slow_memcpy : %llu\n", t2-t1);

memcpy(cache1, cache2, 0x4000); // to eliminate cache effect
t1 = rdtsc();
fast_memcpy(dest, src, size); // block-to-block memcpy
t2 = rdtsc();
printf("ellapsed CPU cycles for fast_memcpy : %llu\n", t2-t1);
printf("\n");
}

printf("thanks for helping my experiment!\n");
printf("flag : ----- erased in this source code -----\n");
return 0;
}

Right. So, back in experiment5,
the fast_memcpy section threw an error, didn’t it?
That just cut off the rest of it.

Means we can only actually get the key if all
the subsequent sections run through without a hitch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// run experiment
for(i=0; i<10; i++){ //
size = sizes[i];
printf("experiment %d : memcpy with buffer size %d\n", i+1, size);
dest = malloc( size ); //

memcpy(cache1, cache2, 0x4000); //
t1 = rdtsc(); //
slow_memcpy(dest, src, size); //
t2 = rdtsc();
printf("ellapsed CPU cycles for slow_memcpy : %llu\n", t2-t1);

memcpy(cache1, cache2, 0x4000);
t1 = rdtsc();
fast_memcpy(dest, src, size); //
t2 = rdtsc();
printf("ellapsed CPU cycles for fast_memcpy : %llu\n", t2-t1);
printf("\n");
}

printf("thanks for helping my experiment!\n");
printf("flag : ----- erased in this source code -----\n");
return 0;
}

1. It runs ten times, as per the input count.

2. Memory is dynamically allocated to the set size,
and that’s stored in dest.

3. The rdtsc() function, for precise timing, calculates
the duration of the slow memory copy and puts that into t1.

4. And t2, of course, will then hold the time taken for the fast memory copy.

5. Right. So, the slow memory copy, that’s done byte by byte.
The fast one? That’s block by block.

→ Solution : First off, we’ll need those functions where
the slow and fast memory copies are actually defined.

Right. You’ll need to head over to that section and
figure out what the function’s actually doing.

Next slow\_memcpy, fast\_memcpyIt’s a function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
char* slow_memcpy(char* dest, const char* src, size_t len){
int i;
for (i=0; i<len; i++) {
dest[i] = src[i];
}
return dest;
}

char* fast_memcpy(char* dest, const char* src, size_t len){
size_t i;
// 64-byte block fast copy
if(len >= 64){ //길이가 64이상만 아래 빠른 복사 이용
i = len / 64;
len &= (64-1);
while(i-- > 0){
__asm__ __volatile__ (
"movdqa (%0), %%xmm0\n"
"movdqa 16(%0), %%xmm1\n"
"movdqa 32(%0), %%xmm2\n"
"movdqa 48(%0), %%xmm3\n"
"movntps %%xmm0, (%1)\n"
"movntps %%xmm1, 16(%1)\n"
"movntps %%xmm2, 32(%1)\n"
"movntps %%xmm3, 48(%1)\n"
::"r"(src),"r"(dest):"memory");
dest += 64;
src += 64;
}
}

// byte-to-byte slow copy
if(len) slow_memcpy(dest, src, len);
return dest;
}

Right. slow\_memcpy,
that’s your basic byte-by-byte operation
from source to destination.
Simple enough, but a bit sluggish,
mind you. fast\_memcpy(), on the other

hand, well, that’s designed for 64-byte chunks…

Right then. If it’s anything under that,
it just runs slow\_memcpy.
But for 64 bytes or more,
it then executes the assembly-level code

Right then. In the fast\_memcpy() section,
it cut out midway through execution, didn’t it?
That meant this assembly code simply wasn’t able to

properly get invoked, thus throwing an error.

Right. The analysis breaks down as follows.

1
2
3
4
5
6
7
8
9
10
__asm__ __volatile__ (
"movdqa (%0), %%xmm0\n"
"movdqa 16(%0), %%xmm1\n"
"movdqa 32(%0), %%xmm2\n"
"movdqa 48(%0), %%xmm3\n"
"movntps %%xmm0, (%1)\n"
"movntps %%xmm1, 16(%1)\n"
"movntps %%xmm2, 32(%1)\n"
"movntps %%xmm3, 48(%1)\n"
::"r"(src),"r"(dest):"memory");

movdqa : Right.The’Aligned Double Quadword Move’.

Right. It takes the double quadword from the source operand
—that’s your second—and moves it to
the destination operand, which is the first one.

Right then.
This instruction moves a double quadword between an XMM
register and a 128-bit memory location,
or between two of those…

…XMM registers. And, naturally,
it can be used for movement between those.
Now, regarding the source or destination operand…

Right. If your operand’s memory-based,
it absolutely has to be aligned to a 16-byte boundary.
Fail to do that, and you’ll get yourself a
General Protection Exception (#GP).

movntps : Right then.
That’s for storing packed single-precision floating-point values,
utilising that non-temporal hint, of course.

then. You’re using the non-temporal hint to move
the packed floating-point values from the source operand,
over to the destination.

…operand, and that, naturally,
prevents any data caching whilst it’s being written to memory.

Right then. In movdqa
the’mov’bit there is,of course,
your instruction for shifting values about.
And’dq’? Well,that’s just your’double quad’, isn’t it?

Right then.
That instruction, it’ll throw an exception if it’s not aligned to
a 16-byte boundary.So, if you just ensure

the destination operand’s address is 16-byte aligned,
that’ll sort the issue right out.


Right then. We’ll just fire up GDB,debug it,
and then go about snagging that key, won’t we?
ㅋㅅㅋ

Screenshot. 6

1
2
3
4
5
6
cd /tmp
mkdir tm
cd tm
ln -s /home/memcpy/memcpy.c
gcc -o memcpy memcpy.c -m32 -lm
echo "8 16 32 64 128 256 512 1024 2048 4096" > exploit

Right then. Once we’ve executed that through GDB,
we’ll then step into the fast\_memcpy function and have a proper look, shall we?

Screenshot. 7

Right then. It’s sat at that location,
and the EDX value is 0x080487cc.

Up to the fourth instance,
the alignment was perfectly fine,

so no errors popped up.
But the fifth one? It wasn’t aligned at all.

Screenshot. 8

Right then. Just set a breakpoint at
the fast\_memcpy function,
specifically at +71. Check the register value,
and you’ll find EDX is indeed aligned at 804d060


Essentially,
it’s 8 bytes short of 64,
meaning you’ll need an 8-byte offset.

And for any other values you’re debugging,
it’s fair to assume they’ll
all be missing 8 as well.

So, we’ll just add eight to the fifth value and
every subsequent one after that…

8 16 32 64 128 256 512 1024 2048 4096

8 16 32 72 136 264 520 1032 2056 4096


then.
So, it’ll assume that form we’ve just established.

And armed with that little theory of ours,
we’ll then just fire up an nclistening server…


…feed each of those in, and then you’ll finally manage to snag the key.


Right. One could,naturally,dabble with other approaches,
I suppose.But for my money,using GDB is probably
the most sensible way to go about it.


So, if you run the program,
you’ll find the key value displayed, much as below.

Screenshot. 9

With the key value we’ve just, ah,’snagged’,
you simply head back to the start,pop it into
the blank,and then,naturally,hit that’auth’ button.

Screenshot.10

Right.
And then you’re presented
with a message about securing 10 points





Screenshot.11

then. And upon returning,
you’ll find it’s indicated
with a green dotted line.
Signifying it’s, well, resolved.

And thanks for actually
bothering to read through all that.

Right then.That’s done, then. :)

미리보기 방지

Right then.
Hope your day isn’t completely rubbish,
and whatever you’re cracking on with goes smoothly.

It’s a bit of a scorcher today,mind you,
so try not to go getting heatstroke.
And, naturally,
do be careful of that Wuhan pneumonia.