Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

The C Programming Language - Lecture Slides | CMSC 430, Study notes of Computer Science

Material Type: Notes; Class: INTRO TO COMPILERS; Subject: Computer Science; University: University of Maryland; Term: Spring 2006;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-bwp-1
koofers-user-bwp-1 🇺🇸

4

(1)

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download The C Programming Language - Lecture Slides | CMSC 430 and more Study notes Computer Science in PDF only on Docsity! Cyclone A Safe Dialect of C Michael Hicks University of Maryland, College Park Credit where credit is due … • Cyclone is a research language, the product of many collaborators: – Greg Morrisett, Yanling Wang (Harvard) – Dan Grossman (Washington) – Nikhil Swamy (Maryland) – Trevor Jim (AT&T) 1988? 2005? • “In order to start copies of itself running on other machines, the worm took advantage of a buffer overrun... • ...it is estimated that it infected and crippled 5 to 10 percent of the machines on the Internet.” • More than 15 years later, nearly half of CERT advisories involve buffer overruns, format string attacks, and similar low-level attacks. The C Programming Language • Critical software is often coded in C – device drivers, kernels – file systems, web servers, email systems – switches, routers, firewalls • … most arguably because it is low-level – Control over data structure representations – Control over memory management – Manifest cost: good performance Low-level, but unsafe • Must bypass the type system to do even simple things (e.g., allocate and initialize an object) • Libraries put the onus on the programmer to do the “right thing” (e.g., check return codes, pass in large enough buffer) • For efficiency, programmers stack-allocate arrays of size K (is K big enough? does the array escape downwards?) • Programmers assume objects can be safely recycled when they cannot, and fail to recycle memory when they should. • It’s not “fail-stop”-errors don’t manifest themselves until well after they happen (e.g., buffer overruns.) What about Java? • Java provides safety in part via: – hidden data fields and run-time checks – automatic memory management • Data representation and resource management are essential aspects of low-level systems Some possible approaches • Compile C with extra information – type fields, size fields, live-pointer table, … – treats C as a higher-level language • Use static analysis – very difficult, not perfect – less modular • Ban unsafe features – there are many – you need them Cyclone A safe, convenient, and modern language at the C level of abstraction • Safe: memory safety, abstract types; fail-stop • C-level: user-controlled data representation and resource management, easy interoperability, “manifest cost” • Convenient: may need more type annotations, but work hard to avoid it • Modern: add features to capture common idioms “New code for legacy or inherently low-level systems” Outline • Status • How Cyclone handles pointer errors – Spatial Errors – Temporal Errors • Programming Experience • Performance Analysis Status • >110K lines of Cyclone code – 80K compiler, libraries – 30K various servers, applications, device drivers • gcc back-end (Linux, Cygwin, OSX, LEGO, …) • User’s manual, mailing lists, … • Still a research vehicle Projects using Cyclone • Open Kernel Environment [Bos/Samwel, OPENARCH 02] • MediaNet [Hicks et al, OPENARCH 03] • RBClick [Patel/Lepreau, OPENARCH 03] • STP [Patel et al., SOSP 03] • FPGA synthesis [Teifel/Manohar, ISACS 04] • O/S class at Maryland [2004-2005] What is a C buffer overflow? #include <stdio> int login() { char user [100]; printf(“login: “); scanf(“%s”,&user); … // get password etc. } What happens if the user types In something that’s more than 100 characters? Thin Pointers H E L L O p Shorthand char *p; char @ p1; char @{6} p2; Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] A “fat” pointer; has run time bounds: 3-word representation Fat Pointers H E L L O c f bq: Pointer arithmetic OK q++; q[0] q--; q--; q[0] Fat Pointers H L L O c f bq: q++; q[0] q--; q--; q[0] Bounds check on dereference E Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] Dangling pointers OK ... Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] … caught on dereference Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] Types and qualifiers char * @fat q; Fat Pointers H E L L O c f bq: q++; q[0] q--; q--; q[0] Shorthand char ? q; Thin Pointer, Dynamic Bounds H E L L O p void foo(int len, char * @numelts(len) p) { for (int i = 0; i<len; i++) p[i] = … } len = 5 Pointer Qualifier Summary • @fat – rep. as a triple: {base, upper, curr} – supports all C pointer ops – but any dereference (may be) checked • @notnull – Obviates null check (compiler must prove) • @numelts(n) – Obviates bounds check (compiler must prove) – Can refer to dynamic lengths • @zeroterm – Pointer is zero-terminated Interfacing with libC FILE *fopen(char * @notnull @zeroterm name, char * @notnull @zeroterm mode); int putc(char, FILE * @notnull); … fgets(char * @zeroterm @numelts(len), int len, FILE * @notnull); Our buildlib tool easily generates platform dependent headers with signatures like these (programmer helps) Temporal Errors pt *add(pt *p, pt *q) { pt r; r->x = p->x + q->x; r->y = p->y + q->y; return &r; } void foo() { pt a = {1,2}; pt b = {3,4}; pt *c = add(&a, &b); c->x = 10; } typedef struct { int x; int y; } pt; r's lifetime ends here! so dereferencing c here can cause problems... Preventing Temporal Errors • Tracks object lifetimes by associating a region with each pointer: int* @region(`r) – A pointer can only be dereferenced while the region is still live. int *`r for short. • Two basic kinds of regions – A lexical block (I.e., an activation record) – The heap (`H); has a global lifetime. Simple Region Example pt a = {1,2}; void foo() { pt b = {3,4}; pt @`H aptr = &a; pt @`foo bptr = &b; addTo(&a, &b); } a lives in the heap region, so &a has type pt @`H. b lives in the activation record of foo so &b has type pt @`foo. region inference can figure out the regions, so the programmer doesn't have to write them Region Polymorphism void addTo<`r1,`r2>(pt *`r1 p, pt *`r2 q) { p->x += q->x; p->y += q->y; } This is standard parametric polymorphism: addTo: ∀`r1. ∀`r2. (pt *`r1 × pt *`r2) → void addTo is parameterized by the regions for p and q. So this would go through... pt *`H add<`r1,`r2>(pt *`r1 p, pt *`r2 q) { pt *r = malloc(sizeof(pt)); r->x = p->x + q->x; r->y = p->y + q->y; return r; } pt a = {1,2}; void foo() { pt b = {3,4}; pt *`H c = add<`H,`foo>(&a, &b); pt *`H d = add<`foo,`foo>(&b, &b); c->x = 10; } And this would be caught pt *`H add<`r1,`r2>(pt *`r1 p, pt *`r2 q) { pt r; r.x = p->x + q->x; r.y = p->y + q->y; return &r; } pt a = {1,2}; void foo() { pt b = {3,4}; pt *`H c = add<`H,`foo>(&a, &b); pt *`H d = add<`foo,`foo>(&b, &b); c->x = 10; } region of r is `add, not `H With inference extern void f(int * x); void foo() { int *\U x = malloc(sizeof(int)); *x = 3; f(x); // alias inserted here automatically free(x); } • Aliasing qualifier \RC – pointed-to data have hidden count field • Aliasing tracked as with unique pointers. Explicit aliasing/freeing via `a *\RC`r alias_refptr(`a *\RC`r); void drop_refptr(`a *\RC`r); Reference-counted Pointers Interesting Combinations • Tracked pointers can be freed manually, with free or drop_refptr, or automatically – Pointers into the heap freed by GC – Pointers into LIFO arenas freed at end of scope • Called a reap by Berger et al • Can use tracked pointers to keys to permit arenas to have non-lexical lifetimes – Lifetime of arena corresponds with the liveness of the key – Called dynamic arena Refcounted restrictedmanualUnique GCsingle objects Heap manualDynamic dynamicLIFO okexit of scope whole region staticStack Aliasing (objs) Deallocation (what) (when) Allocation (objects) Region Variety Summary • Many different idioms could be hard to use – Duplicated library functions – Hard-to-change application code • We have solved this problem by – Using region types as a unifying theme – Region (and aliasing) polymorphism • E.g., functions independent of arguments’ regions/aliasing – All regions can be treated as if lexical • Temporarily, under correct circumstances • Using alias and open (for dynamic arenas) Ensuring Uniformity and Reuse Boa web server Cfrac Prime factorization BetaFTPD ftp server Epic image compression Kiss-FFT portable fourier transform MediaNet streaming overlay network Linux Drivers net, video, sound CycWeb web server CycScheme scheme interpreter Some Application Experience Application Characteristics • Platform – Dual 1.6 GHz AMD Athlon MP 2000 • 1 GB RAM • Switched Myrinet – Linux 2.4.20 (RedHat) • Software – C code: gcc 3.2.2 – Cyclone code: cyclone 0.9 – GC: BDW conservative collector 6.2α4 – malloc/free: Lea allocator 2.7.2 Experimental Measurements • CPU time – I/O bound applications have comparable performance • All applications: at most 60% slowdown – GC has little impact on elapsed time • MediaNet is the exception • Memory usage – Using GC requires far more memory than manual – Cyclone manual techniques approach footprint of C original Bottom Line Throughput (Webservers) Throughput (MediaNet) Memory Usage (Web) Memory Usage II (Web) Memory Usage III (Web) (4 KB packets) Memory Usage (MediaNet) Other Apps (C vs. Cyc GC) Other Apps (Cyc GC vs. no GC) Things I didn’t talk about • Modern language features too – Tagged unions and data types – Pattern matching – Exceptions – Allocation with new • Porting tool • Lots of libraries
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved