Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Distributed Software Development: Networking, Systems, and Protocols, Study notes of Software Engineering

An in-depth exploration of distributed software development, focusing on networking, distributed systems, and communication protocols. Topics covered include the seven-layer model, tcp/ip, message transmission across layers, and the modern networking stack. Students will gain a solid understanding of the basics of distributed systems, their advantages and disadvantages, and the challenges of dealing with time and process failure.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-vi7-1
koofers-user-vi7-1 🇺🇸

10 documents

1 / 42

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Distributed Software Development: Networking, Systems, and Protocols and more Study notes Software Engineering in PDF only on Docsity! Distributed Software Development Fundamentals Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science — University of San Francisco – p. 1/?? Outline • Networking overview • Seven-layer model • Intro to Distributed Systems • Characteristics • Desirable Properties • Dealing with Time Department of Computer Science — University of San Francisco – p. 2/?? Layering • Modern network design takes advantage of the idea of layering • A particular service or module is constructed as a black box. • Users of that service do not need to know its internals, just its interface. • This makes it easy to later build new modules (or layers) that use the lower layers. • For example, HTTP is built on top of TCP. • A web browser does not typically need to worry about the implementation of TCP, just that it works. • Unlike OO modules, the layers in a networked system comprise protocols that span multiple machines. Department of Computer Science — University of San Francisco – p. 5/?? The OSI seven-layer model • ISO (a standards body) developed a reference model called OSI that defines the different layers needed for communication, and specifies which should do each job. • The goal is to produce an open protocol that allows for heterogeneous, extensible systems. • A protocol is a specification describing the order and format of messages. • An open protocol is one in which all of this information is publicly available. Department of Computer Science — University of San Francisco – p. 6/?? The OSI seven-layer model • Application • Presentation • Session • Transport • Network • Data Link • Physical Department of Computer Science — University of San Francisco – p. 7/?? Layers and packets • Each layer constructs a packet containing a portion of the data to be transmitted. • This packet has a data section, and a header. • The header contains origin and destination information, checksums, sequence numbers, and other identifying information. • When a message is sent by TCP, a packet is constructed and passed down to the IP layer. • This entire packet then becomes the data portion of the IP packet, which is passed down to the network layer, and so on. • On the other end, the lowest layer removes the header and checks the data integrity, then passes the data portion up to the next layer. Department of Computer Science — University of San Francisco – p. 10/?? Physical Layer • This is the lowest-level layer, responsible for transmitting 0s and 1s. • Governs transmission rates, full or half-duplex, etc. • A modem works at the physical layer. • Lots of interesting problems at this level that we won’t get into ... Department of Computer Science — University of San Francisco – p. 11/?? Data Link Layer • The data link layer provides error handling for the physical layer. • Individual bits are grouped together into frames. • A checksum is then computed to detect transmission errors. • The data link layer can then request a retransmission of an error is detected. • Messages are numbered; receiver can request re-transmission of any message in a sequence. • Each frame is a separate, distinct message. • The Data link layer provides error-free transmission to upper-level layers. Department of Computer Science — University of San Francisco – p. 12/?? Transport Layer • The network layer still operates at the level of individual packets, or datagrams. • Packets may get lost, or arrive out of order. • TCP is a transport-level protocol that provides connection-oriented service. • Guaranteed, in-order delivery. • State is maintained. • This layer will also manage quality-of-service and some congestion control. • UDP is also a transport level protocol, albeit one that does not provide connection-oriented delivery. Department of Computer Science — University of San Francisco – p. 15/?? Session Layer • The session layer was designed to provide support for access rights and synchronization. • In practice, it is not widely used, and is not present in the TCP/IP suite. Department of Computer Science — University of San Francisco – p. 16/?? Presentation Layer • The presentation layer controls display of packet information. • This may include encryption/decryption, compression, translation between character formats. Department of Computer Science — University of San Francisco – p. 17/?? HTTP requests • HTTP has a very simple message format. GET /~brooks/index.html HTTP/1.1 Host: www.cs.usfca.edu Connection: close User-agent: Mozilla/4.0 Accept-language: en • You can try this out for yourself with telnet ... Department of Computer Science — University of San Francisco – p. 20/?? HTTP • There are lots of wrinkles and extensions to HTTP • Cookies to help save state • CGI, SOAP to pass data and execute code as the result of an HTTP request. • Web caching to store data closer to clients. • These are all possible because HTTP is an open protocol. • This is also what makes it possible for different companies to write web browsers and web servers that seamlessly work together. Department of Computer Science — University of San Francisco – p. 21/?? Summary • The modern networking stack can be conceptually broken into a set of layers. • Each layer has a specific, well-defined function. • Acts as a black box • Higher-level layers build on the functionality of lower-level layers. • We’ll be primarily concerned with the Transport and Application layers. Department of Computer Science — University of San Francisco – p. 22/?? Advantages of a distributed system • Can share expensive resources or data • Economics • A collection of PCs can provide better price/performance than a single mainframe. • Speed • A distributed system will often have more computing power than a single mainframe. • Inherent distribution • Often, your data/users/resources are geographically distributed Department of Computer Science — University of San Francisco – p. 25/?? Advantages of a distributed system • Reliability • If one node fails, the rest of the system can continue • Incremental growth • Components can be added or replaced in small increments. Department of Computer Science — University of San Francisco – p. 26/?? Disadvantages of distributed systems • Software design is much more complicated. • Lack of appropriate tools/languages • Disagreement on principles: how much should users know about the system? How much the system handle on a user’s behalf? • Potential network saturation • Privacy and security issues • Allowing resources to be shared can lead to data leakage • Extra sysadmin work Department of Computer Science — University of San Francisco – p. 27/?? Transparency • Is transparency always a good thing? What is the downside? Department of Computer Science — University of San Francisco – p. 30/?? Flexibility • Flexibility refers to how easy or difficult it is to change or reconfigure a system. • The research question is how to best provide flexibility. • In the OS world, this debate shows up in the comparison of monolithic kernels and microkernels. • Monolithic kernel - Provides most services on its own • Microkernel - Only handles a simple set of services. Most other services are implemented at the user level. • Microkernel is very flexible and modular; services can be added, deleted, or moved without much reconfiguration. • Monolithic kernel gives better performance. Department of Computer Science — University of San Francisco – p. 31/?? Reliability • There are several different aspects of reliability: • Availability: what fraction of the time is the system usable? • Integrity: Data must be kept consistent. (this sometimes clashes with availability) • Security: Unauthorized usage must be prevented. • Fault tolerance: How unpleasantly does the system fail? Is data lost? Can recovery happen? Department of Computer Science — University of San Francisco – p. 32/?? Types of Communication Failure • We also must consider failures that happen in the network: • Crash: a link stops completely. • Omission: A link fails to transmit some of its messages. • Byzantine: A link can exhibit any possible behavior, including generating spurious messages. • Note: A Byzantine failure can be treated the same as an attacker/intruder. Department of Computer Science — University of San Francisco – p. 35/?? Communication paradigms • Reliable communication: messages are guaranteed to eventually arrive. • In-order: messages are guaranteed to arrive in the order they are sent. Department of Computer Science — University of San Francisco – p. 36/?? Communication paradigms • Asynchronous: there is no bound on message delay • Synchronous: • Known upper bound b on message delay • Every process p has a local clock Cp which drifts at a rate of r > 0 and ∀p and ∀t > t′: (1 + r)−1 ≤ Cp(t)−Cp(t ′) t−t′ ≤ (1 + r) • In English, clock drift has an upper and lower bound. • Also, bounds on the amount of time needed for a process to execute a single step. • Synchronous communication allows you to implement approximately synchronized clocks, even in the presence of failure. Department of Computer Science — University of San Francisco – p. 37/?? Global time servers • NTP is an Internet Protocol that allows your machine to synchronize its clock with a remote source, thereby keeping it accurate. • Is that all we need to do? • Maybe. Maybe not. • What if we don’t have an Internet connection, or NTP is blocked by our firewall? • Can we guarantee that all users use the same remote time server? • How often should they update? • What if users don’t do this? Department of Computer Science — University of San Francisco – p. 40/?? Logical time • The algorithms we’ll look at in this class will not need to depend on the absolute time that something happens. • Instead, we’ll be interested in the logical time, or causal order in which events occur. • As long as all processes agree on the order in which a set of events that influence each other occurs, we’re OK. • We’ll spend time next week looking at this problem. Department of Computer Science — University of San Francisco – p. 41/?? Summary • There are lots of desirable properties and design issues for distributed systems. • Performance, scalability, reliability, flexibility, transparency • Often, we must sacrifice one for another • Some (e.g. Parallel transparency) are not possible with today’s technology. • Communication can be either synchronous or asynchronous • Time is a very sticky problem to deal with in distributed systems. • Characterizing types of failure will help us identify what our algorithms and systems can and cannot stand up to. Department of Computer Science — University of San Francisco – p. 42/??
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved