Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Buffer Overflow 2-Computer Sciences Applications-Project Report, Study Guides, Projects, Research of Applications of Computer Sciences

This report is for final year project to complete degree in Computer Science. It emphasis on Applications of Computer Sciences. It was supervised by Dr. Abhisri Yashwant at Bengal Engineering and Science University. Its main points are: Buffer, Overflows, Literature, Review, Register, Memory, Addressing, Architecture, Experimenting, Hardware

Typology: Study Guides, Projects, Research

2011/2012

Uploaded on 07/18/2012

padmini
padmini 🇮🇳

4.3

(202)

175 documents

1 / 26

Toggle sidebar

Related documents


Partial preview of the text

Download Buffer Overflow 2-Computer Sciences Applications-Project Report and more Study Guides, Projects, Research Applications of Computer Sciences in PDF only on Docsity! i Table of Contents 1. Introduction.……...……………………………………………………………….01 1.1 Buffer Overflows.................................................................................…...01 1.2 Why Buffer Overflows...…………………………………………………01 2. Goals Achieved…………..……………………………………………………….02 2.1 Literature Review………..…………………………………………….....02 2.1.1 Register…………………………………………………………… 02 2.1.2 Memory addressing………………………………………………..06 2.1.3 x86 architecture for memory………………………………………07 2.1.4 Win32 Assembly…………………………………………………..08 2.2 The stack………………………………………………...………………..08 2.3 The method…………………………………...….……...………………..10 2.4 Experimenting of Buffer Overflows…………………...……..…………10 3. Future Goals………………………………………………...…………………….13 4. Key Terminologies…..…………………………………………………………...13 4.1 Hardware Terminologies............................................................................13 4.2 Software Terminologies..............................................................................14 4.3 Security Terminologies...............................................................................20 5. References.................................................................................................................22 5.1 Related to Books..........................................................................................22 5.2 Related to URL............................................................................................22 5.3 Related to Development tool.......................................................................22 docsity.com ii List of figures Figure 1 x86 register set. .................................................................................................. 6 Figure 2 Push operations in a stack................................................................................. 9 Figure 3 Disassembled code, showing the start if the function. .................................. 12 Figure 4 Result of a successful buffer overflow............................................................ 13 docsity.com 1 1. Introduction Software engineering is an extremely difficult task and of all software creation related professions, software architects have quite possibly the most difficult task. Initially, software architects were only responsible for the high-level design of the products. More often than not this included protocol selection, third-party component evaluation and selection, and communication medium selection. We make no argument here that these are all valuable and necessary objectives for any architect, but today the job is much more difficult. It requires an intimate knowledge of operating systems, software languages, and their inherent advantages and disadvantages in regards to different platforms. Additionally, software architects face increasing pressure to design flexible software that is impenetrable to wily hackers which is a near impossible task in itself. 1.1 Buffer Overflows Buffer overflows are proof that the computer science, or software programming, community still does not have an understanding (more importantly, firm knowledge) of how to design, create, and implement secure code. Like it or not, all buffer overflows are a product of poorly constructed software programs. These programs may have multiple deficiencies such as stack overflows, heap corruption, format string bugs, and race conditions—the first three commonly being referred to as simply buffer overflows. Buffer overflows can be as small as one misplaced character in a million-line program or as complex as multiple character arrays that are inappropriately handled. 1.2 Why Buffer Overflows Buffer overflows may cause a process to crash or produce incorrect results. They can be triggered by inputs specifically designed to execute malicious code or to make the program operate in an unintended way. As such, buffer overflows cause many software vulnerabilities and form the basis of many exploits. Sufficient bounds checking by either the programmer or the compiler can prevent buffer overflows. Contrary to popular belief, docsity.com 2 it is nearly impossible to determine if vulnerabilities are being identified and released at an increasing or decreasing rate. One factor may be that it is increasingly difficult to define and document vulnerabilities. 2. Goals Achieved  Literature Review.  Results of a complex buffer overflow test. 2.1 Literature Review  Registers.  Memory addressing.  x86 architecture for memory.  Win32 Assembly. 2.1.1 Registers The x86 architecture has a number of different types of registers, which include general purpose, segment, index and control registers. Following are the different registers: 2.1.1.1 General General purpose registers are used for arithmetic and data movement. Each register can be addressed as either a 16 or 32 bit value. EAX (accumulator) The extended accumulator register is used for word divide, word multiply and word I/O. The AL (Lower part of EAX) is used for byte divide, byte multiply, byte docsity.com 3 input and output, translate and decimal arithmetic. Where as AH (Higher part of lower two bytes) is used only of byte multiply and bytes divide. EBX (base) The extended base register is used only for translation. Here the translation refers to the translation of memory addresses. It can also be combined with SI and DI for combined indexing. ECX (counter) The extended counter register is used for counting and loops. This register automatically increments in a loop instruction. EDX (data) The extended data register is used for word multiply, word divide and indirect input and output operations. 2.1.1.2 Segment Segment registers are used for base locations for program instructions, data and stack. All references to memory involve a segment register used as a base location. CS (code segment)\ The processor uses CS segment for all accesses to instructions referenced by instruction pointer (IP) register. CS register cannot be changed directly. The CS register is automatically updated during far jump, far call and far return instructions. DS (data segment) By default, the processor assumes that all data referenced by general registers (AX, BX, CX, DX) and index register (SI, DI) is located in the data segment. DS register can be changed directly using POP and LDS instructions. SS (stack segment) By default, the processor assumes that all data referenced by the stack pointer (SP) and base pointer (BP) registers is located in the stack segment. SS register can be changed directly using POP instruction. docsity.com 6 Figure 1 x86 register set. 2.1.2 Memory addressing In a x86 architecture the following are the addressing modes used for addressing the memory locations.  Implied - the data value/data address is implicitly associated with the instruction.  Register - references the data in a register or in a register pair.  Immediate - the data is provided in the instruction. docsity.com 7  Direct - the instruction operand specifies the memory address where data is located.  Register indirect - instruction specifies a register containing an address, where data is located. This addressing mode works with SI, DI, BX and BP registers.  Based - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP), the resulting value is a pointer to location where data resides.  Indexed - 8-bit or 16-bit instruction operand is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides.  Based Indexed - the contents of a base register (BX or BP) is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides.  Based Indexed with displacement - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP) and index register (SI or DI), the resulting value is a pointer to location where data resides. 2.1.3 x86 architecture for memory Processor architectures are roughly divided into little-endian and bigendian, according to the way multibyte data is stored in memory. If the processor stores the least significant byte of a multibyte word at a higher address and the most significant at a lower address, this is the big-endian method. If the situation is reversed—the least significant byte is stored at the lowest address in memory and higher bytes at increasing addresses—this is a little-endian system. For example, a 4- byte word 0x12345678 stored at an address 0x400 on a big-endian machine would be placed in memory as follows: 0x400 0x78 0x401 0x56 0x402 0x34 0x403 0x12 docsity.com 8 For a little-endian system, the order is reversed: 0x400 0x12 0x401 0x34 0x402 0x56 0x403 0x78 Knowing that Intel x86 is little-endian is important for understanding the reason that off-by-one overflows can be exploited. Sun SPARC architecture, for example, is big- endian. 2.1.4 Win32 Assembly Prologue and epilogue code is generated automatically by a compiler in order to set up a stack frame, preserve registers and maintain a stack frame (all the information put on the stack that’s related to one function is called a stack frame) after a function call is completed. The body contains the actual code to the function call. Prologue and epilogue are architecture and compiler-specific. 2.2 The Stack Our primary interest, of course, is the stack. Let’s look at this data structure a bit closer and see how it relates to and interfaces with the registers. We are forced to look at a little assembly language code at this point, but as you shall see, it is really not all that frightening. As mentioned earlier, the stack is a special memory buffer outside of the CPU used as a temporary holding area for addresses and data. The stack resides inside of the stack segment. Each 16-bit location on the tack is pointed at by the ESP register, or stack pointer. The stack pointer, in turn, holds the address of the last data element to be added to, or pushed onto the tack. It is important to note that the push operation pushes data backwards onto the stack. This is what causes stack memory to grow downward, or grow toward the lower memory addresses. Now this can truly be confusing and make one’s head spin, but it must be understood to execute an attack on the stack. So please, just hang tight. Conversely, the last value added to the stack is also the first one to be docsity.com 11 cin>>ch; if(strcmp(ch,"PIEAS")==0) enter(); return 0; }; void enter() { cout<<"You are welcomed!"; }; void main() { cout<<"Now lets start!\n"; test(); getch(); } From simple programmers point of view there is no error in the code and also the compiler didn’t showed any while compiling the code. But here in the code it is quit visible that the buffer is of the size 10 and the cin function doesn’t restrict the input from the user to some specific length. Now form a hacker's points of view this is a golden opportunity and he will make this program his target by buffer overflow. What he will do is that he will follow the above mentioned sequence of events so that he can gain access on this code with a wrong password. Now what a hacker will do was performed by me so that I can prove what I am saying! So first of all I compiled this code and made an executable file so that I get into the hackers shoes. Then I simple disassembled this code with IDA and saw where the function that is called so that the message is displayed. I noted the address which was 401153 in the HEX form. docsity.com 12 Figure 3 Disassembled code, showing the start if the function. Since Intel used little endian architecture as mentioned above, I flipped this address in two byte form and got 531140. After this I converted this code into decimal form which became 831764. Now what I did is that I gave the executable code to OLLYDBG and in the input I gave it a sequence of characters so that an uncontrolled buffer overflow occurs. After which I noted the length of characters that is needed to reach the extended instruction pointer, and that came to be sixteen. So after this I again ran the code and gave it the following input: AAAAAAAAAAAAAAAAS◄@ And at last I got a controlled buffer overflow with that message that comes with a correct password. Here it is clear that any body having some knowledge about memory access can gain access through this login code. docsity.com 13 Figure 4 Result of a successful buffer overflow. 3. Future Goals Since up till now all the buffer overflows have been done on a local machine so in the 8th semester the emphasis will be placed on the study of remote buffer overflows. And here by remotely it means that the buffer overflow will be done across a local area network and not on the internet. 4. Key Terminologies One of the most daunting tasks for any security professional is to stay on top of the latest terms, slang, and definitions that drive new products, technologies, and services. While most of the slang is generated these days online via chat sessions, specifically IRC, it is also being passed around in white papers, conference discussions, and just by word of mouth. Since buffer overflows will dive into code, complex computer and software topics, and techniques for automating exploitation, we felt it necessary to document some of the commonest terms just to ensure that everyone is on the same page. 4.1 Hardware Terminologies The following definitions are commonly utilized to describe aspects of computers and their component hardware as they relate to security vulnerabilities: docsity.com 16 Encapsulation provides a logical structure to a program and allows for easy methods of inheritance. ■ Function A function may be thought of as a miniature program. In many cases, a programmer may wish to take a certain type of input, perform a specific operation and output the result in a particular format. Programmers have developed the concept of a function for such repetitive operations. Functions are contained areas of a program that may be called to perform operations on data. They take a specific number of arguments and return an output value. ■ Functional Language Programs written in functional languages are organized into mathematical functions. True functional programs do not have variable assignments; lists and functions are all that is necessary to achieve the desired output. ■ GDB The GNU debugger (GDB) is the defacto debugger on UNIX systems. GDB is available at: http://sources.redhat.com/gdb/. ■ Heap The heap is an area of memory utilized by an application and is allocated dynamically at runtime. Static variables are stored on the stack along with data allocated using the malloc interface. ■ Inheritance Object-oriented organization and encapsulation allow programmers to easily reuse, or ―inherit,‖ previously written code. Inheritance saves time since programmers do not have to recode previously implemented functionality. ■ Integer Wrapping In the case of unsigned values, integer wrapping occurs when an overly large unsigned value is sent to an application that ―wraps‖ the integer back to zero or a small number. A similar problem exists with signed integers: wrapping from a large positive number to a negative number, zero, or a small positive number. With signed integers, the reverse is true as well: a ―large negative number‖ could be sent to an application that ―wraps‖ back to a positive number, zero, or a smaller negative number. ■ Interpreter An interpreter reads and executes program code. Unlike a compiler, the code is not translated into machine code and then stored for later re-use. Instead, an interpreter reads the higher-level source code each time. An advantage of an interpreter is that it aids in platform independence. Programmers do not need to compile their source code for multiple platforms. Every system which has an interpreter for the language will docsity.com 17 be able to run the same program code. The interpreter for the Java language interprets Java byte-code and performs functions such as automatic garbage collection. ■ Java Java is a modern, object-oriented programming language developed by Sun Microsystems in the early 1990s. It combines a similar syntax to C and C++ with features such as platform independence and automatic garbage collection. Java applets are small Java programs that run in Web browsers and perform dynamic tasks impossible in static HTML. ■ Little Endian Little and big endian refers to those bytes that are the most significant. In a little-endian system, the least significant byte is stored first. x86 uses a little-endian architecture. ■ Machine Language Machine code can be understood and executed by a processor. After a programmer writes a program in a high-level language, such as C, a compiler translates that code into machine code. This code can be stored for later reuse. ■ Malloc The malloc function call dynamically allocates n number of bytes on the heap. Many vulnerabilities are associated with the way this data is handled. ■ Memset/Memcpy The memset function call is used to fill a heap buffer with a specified number of bytes of a certain character. The memcpy function call copies a specified number of bytes from one buffer to another buffer on the heap. This function has similar security implication as strncpy. ■ Method A method is another name for a function in languages such as Java and C#. A method may be thought of as a miniature program. In many cases, a programmer may wish to take a certain type of input, perform a specific operation and output the result in a particular format. Programmers have developed the concept of a method for such repetitive operations. Methods are contained areas of a program that may be called to perform operations on data. They take a specific number of arguments and return an output value. ■ Multithreading Threads are sections of program code that may be executed in parallel. Multithreaded programs take advantage of systems with multiple processors by sending independent threads to separate processors for fast execution. Threads are useful when different program functions require different priorities. While each thread is assigned docsity.com 18 memory and CPU time, threads with higher priorities can preempt other, less important threads. In this way, multithreading leads to faster, more responsive programs. ■ NULL A term used to describe a programming variable which has not had a value set. Although it varies form each programming language, a null value is not necessarily the same as a value of ―‖ or 0. ■ Object-oriented Object-oriented programming is a modern programming paradigm. Object-oriented programs are organized into classes. Instances of classes, called objects, contain data and methods which perform actions on that data. Objects communicate by sending messages to other objects, requesting that certain actions be performed. The advantages of object-oriented programming include encapsulation, inheritance, and data hiding. ■ Platform Independence Platform independence is the idea that program code can run on different systems without modification or recompilation. When program source code is compiled, it may only run on the system for which it was compiled. Interpreted languages, such as Java, do not have such a restriction. Every system which has an inter- preter for the language will be able to run the same program code. ■ printf This is the most commonly used LIBC function for outputting data to a command-line interface. his function is subject to security implications because a format string specifier can be passed to the function call that specifies how the data being output should be displayed. If the format string specifier is not specified, a software bug exists that could potentially be a vulnerability. ■ Procedural Language Programs written in a procedural language may be viewed as a sequence of instructions, where data at certain memory locations are modified at each step. Such programs also involve constructs for the repetition of certain tasks, such as loops and procedures. The most common procedural language is C. ■ Program A program is a collection of commands that may be understood by a computer system. Programs may be written in a high-level language, such as Java or C, or in low-level assembly language. ■ Programming Language Programs are written in a programming language. There is significant variation in programming languages. The language determines the syntax and organization of a program, as well as the types of tasks that may be performed. docsity.com 21 a user has the ability to input data to the function, a buffer can be crafted to gain control of the program. ■ Heap Corruption Heap overflows are often more accurately referred to as heap corruption bugs because when a buffer on the stack is overrun, the data normally overflows into other buffers, whereas on the heap, the data corrupts memory which may or may not be important/useful/exploitable. Heap corruption bugs are vulnerabilities that take place in the heap area of memory. These bugs can come in many forms, including malloc implementation and static buffer overruns. Unlike the stack, many requirements must be met for a heap corruption bug to be exploitable. ■ Off-by-One An ―off-by-one‖ bug is present when a buffer is set up with size n and somewhere in the application a function attempts to write n+1 bytes to the buffer. This often occurs with static buffers when the programmer does not account for a trailing null that is appended to the n-sized data (hence n+1) that is being written to the n-sized buffer. ■ Stack Overflow A stack overflow occurs when a buffer has been overrun in the stack space. When this happens, the return address is overwritten, allowing for arbitrary code to be executed. The most common type of exploitable vulnerability is a stack overflow. String functions such as strcpy, strcat, and so on are common starting points when looking for stack overflows in source code. ■ Vulnerability A vulnerability is an exposure that has the potential to be exploited. Most vulnerability that has real-world implications are specific software bugs. However, logic errors are also vulnerabilities. For instance, the lack of requiring a password or allowing a null password is vulnerability. This logic, or design error, is not fundamentally a software bug. docsity.com 22 5. References 5.1 Related to Books [1] James C foster, ―Buffer Overflow Attacks: Detect, Exploit, Prevent.‖, Syngress, pp.117 – 228, Feb 2005. ISBN 1-932266-67-4. [2] Walter A. Triebel, Avtar Singh , ―The 8088 and 8086 microprocessor‖, Prentice Hall, 3/E, pp.23 – 87, ISBN 0-13-085262-7. 5.2 Related to URL [1] Blake Wiedman, "Security audits", Continuously, URL:http://www.governmentsecurity.org/forum/index.php?showtopic=7728 [2] SANS Institute, "Online assistance and training", Maryland, USA, URL: http://www.sans.org/ 5.3 Related to Development tool [1] IDA Pro v4.3, DataRescue Inc, Intro,(2005) [2] OllyDbg v1.10, Oleh Yuschuk, Intro,(2004) docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved