Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS 636 Internetworking: Exact Match Lookups and Router Algorithms - Prof. Ramana Rao Kompe, Study notes of Computer Science

A midterm review for cs 636 internetworking course, focusing on exact match lookups and router algorithms. Exact match lookups are a method of retrieving state associated with a given key under a light jogging constraint, using solutions such as search trees and hash tables. The document also covers extending ethernet using bridges and scaling via hashing, as well as problems and solutions related to these topics.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-yez-1
koofers-user-yez-1 🇺🇸

10 documents

1 / 88

Toggle sidebar

Related documents


Partial preview of the text

Download CS 636 Internetworking: Exact Match Lookups and Router Algorithms - Prof. Ramana Rao Kompe and more Study notes Computer Science in PDF only on Docsity! CS 636 Internetworking CS
636
Internetworking
 Ramana
Kompella
 ROUTER
ALGORITHMICS
 Midterm
Review
 1 CS 636 Internetworking Exact Match Lookups CS 636 Internetworking CS 636 Internetworking Scaling
via
hashing
 •  Gigaswitch
32
x
100Mbps
FDDI
ports
 •  Use
hashing
instead
of
search
tree
 –  Avoid
worst
case
by
using
perfect
hashing
to
avoid
too
many
 collisions
–
A(x)*M(x)
mod
G(x)
where
G(x)
=
 X48+X36+X25+X10+1,
and
A(x)
is
address,
M(x)
is
a
non‐zero
 mulJplier
 •  Bocom
16
bits
index
into
64K
hash
table
 –  Remaining
32bits
used
to
disambiguate
entries
 5 CS 636 Internetworking Problem
with
hashing
 •  Non‐determinisJc
 – Do
not
provide
worst
case
guarantees

 – Can
restrict
to
3‐4
memory
accesses,
but
update
 complexity
even
worse
 •  Alternate
approach
 – Hardware
pipelining
of
a
binary
search
tree
 6 CS 636 Internetworking CS 636 Internetworking IP Prefix Lookups 7 CS 636 Internetworking Unibit
tries
 P1 0* P2 1* P3 100* P4 1000* P5 100000* P6 101* P7 110* P8 11001* P9 111* Uni-bit trie 0 P1 1 P2 0 1 0 P7 1 P9 0 P3 1 P6 0 P4 1 0 1 0 P5 1 0 1 0 1 P8 Input 11000010 Longest matching prefix P2 7 10 CS 636 Internetworking MulJ
bit
tries
 •  Consider
mulJple
bits
at
a
Jme
 •  Faster
lookup
 •  Problem:

 – Prefixes
are
not
aligned
with
stride
boundary
 •  SoluJon:
 – Controlled
prefix
expansion
 11 CS 636 Internetworking Controlled
prefix
expansion
 P1 0* P2 1* P3 100* P4 1000* P5 100000* P6 101* P7 110* P8 11001* P9 111* Old routing table P1 000* P1 001* P1 010* P1 011* P2 100* P2 101* P2 110* P2 111* P3 100* P4 100000* P4 100001* P4 100010* P4 100011* P5 100000* P6 101* P7 110* P8 110010* P8 110011* P9 111* Controlled prefix expansion with stride 3 P1 000* P1 001* P1 010* P1 011* P3 100* P4 100001* P4 100010* P4 100011* P5 100000* P6 101* P7 110* P8 110010* P8 110011* P9 111* New routing table 12 CS 636 Internetworking Lulea
compressed
tries
 P1 000* P1 001* P1 010* P1 011* P3 100* P4 100001* P4 100010* P4 100011* P5 100000* P6 101* P7 110* P8 110010* P8 110011* P9 111* 000 P1 001 P1 010 P1 011 P1 100 101 P5 110 111 P9 P1 P5 P9 000 1 001 0 010 0 011 0 100 1 101 1 110 1 111 1 Lulea bitmap compression Repeating entries are stored only once in the compressed array. An auxiliary bitmap is needed to find the right entry in the compressed node. It stores a 0 for positions that do not differ from the previous one. Reduces storage to about 160KBytes for MAE East Compressed node 15 CS 636 Internetworking CS 636 Internetworking 16 P1 000* P1 001* P1 010* P1 011* P3 100* P4 100001* P4 100010* P4 100011* P5 100000* P6 101* P7 110* P8 110010* P8 110011* P9 111* 00000 1 00001 0 00010 0 00011 0 00100 1 00101 1 00110 1 00111 0 01000 1 01001 0 01010 1 01011 0 01100 1 01101 0 01110 0 01111 1 10000 1 10001 0 10010 1 10011 0 10100 1 10101 1 10110 1 10111 0 11000 0 11001 0 11010 0 11011 0 11100 1 11101 1 11110 1 11111 0 00 0 01 4 10 8 11 13 When the compression bitmaps are large it is expensive to count bits during lookup. The bitmap is divided into chunks and a pre- computed auxiliary array stores the number of bits set before each chunk. The lookup algorithm needs to count only bits set within one chunk. Bitmap supporting fast counting 13+0=13 11001010 Longest matching prefix P7 16 CS 636 Internetworking 11001010 P1 0* P2 1* P3 100* P4 1000* P5 100000* P6 101* P7 110* P8 11001* P9 111* Longest matching prefix P7 000 0 001 0 010 0 011 0 100 1 101 0 110 1 111 0 0* 1 1* 1 00* 0 01* 0 10* 0 11* 0 000* 0 001* 0 010* 0 011* 0 100* 1 101* 1 110* 1 111* 1 P1 P2 P3 P6 P7 P9 Pointers to children and prefixes are stored in separate structures. Prefixes of all lengths are stored, thus leaf pushing is not needed and update is fast. Bitmaps have 1s corresponding to entries that are not empty. Representing node as tree bitmap 2 17 CS 636 Internetworking CS 636 Internetworking Packet Classification 20 CS 636 Internetworking Example
Classifier
 Rule Destination Address Source Address R1 0* 10* R2 0* 01* R3 0* 1* R4 00* 1* R5 00* 11* R6 10* 1* R7 * 00* 21 Set‐pruning
Tries
[Tsuchiya,
Sri98]
 Dimension DA 0 0 0 1 R7 R2 R1 R5 R7 R2 R1 R3 R7 R6 R7 R4 O(N2) memory O(2W) lookup Dimension SA Rule Destination Address Source Address R1 0* 10* R2 0* 01* R3 0* 1* R4 00* 1* R5 00* 11* R6 10* 1* R7 * 00* 22 CS 636 Internetworking Beyond
2‐d
 •  Simplest
scheme
–
extend
any
2d
scheme
to
 mulJple
dimensions
 – Given
at
most
20
rules
match,
use
linear
search
 •  Advantage:
no
replicaJon,
port
ranges
stay
as
 ranges.
 Any
two
dimensional
 search
algorithm
for
 all
matches
for
(S,
D)
 R1
 R2
 R5
 R7
 R2
 R6
 25 CS 636 Internetworking Extended grid of tries ; Field 1 Field 1 a ooo Sc ur Ft0,F11 1 eee crit : wR er 2 Le: Fe Fo,\F1 L3: F5, Fé Ls: Fe © Field 2 la ‘8 ~ \ jump pointer ¢ Grid of tries for normal 2-d matches — We fixed backtracking with replication ¢ EGT-PC [BSV03] uses pre-computation of rule costs and path computation Divide‐and‐conquer
 •  Three
schemes
 – Bit
vector
linear
search
 – On‐demand
cross‐producJng
 – Equivalenced
cross‐producJng
 •  Common
idea:
Search
along
individual
 dimensions
and
combine
results.
 27 CS 636 Internetworking Equivalenced
cross‐producJng
 •  Equivalenced
cross‐producJng
(a.k.a.
 recursive
flow
classificaJon
or
RFC)

 •  Combines
the
results
of
the
per‐field
 longest
matching
prefix
operaJons
two
 by
two.

 •  Pairs
of
values
grouped
in
equivalence
 classes
 •  Leads
to
significant
memory
savings
as
 compared
to
simple
cross‐producJng.

 •  Provides
fast
packet
classificaJon,
but
 compared
to
other
algorithms,
the
 memory
requirements
relaJvely
large
 Dest IP - Src IP Rule bitmap Class 0 M,S 11110011 C1 1 M,TO 11010011 C2 2 M,Net 11010111 C3 3 M,* 11010011 C2 4 TI,S 00000011 C4 5 TI,T0 00001011 C5 6 TI,Net 00000111 C6 7 TI,* 00000011 C4 8 Net,S 00000011 C4 9 Net,TO 00000011 C4 10 Net,Net 00000111 C6 11 Net,* 00000011 C4 12 *,S 00000001 C7 13 *,TO 00000001 C7 14 *,Net 00000100 C8 15 *,* 00000001 C7 16 entries, 8 distinct classes Src IP Dest IP Src Port Dest Port Proto Final result 30 CS 636 Internetworking Decision
tree
approaches
 •  At
each
node
of
the
tree
test
a
bit
in
a
field
or
perform
a
 range
test
 –  Large
fan‐out
leads
to
shallow
trees
and
fast
classificaJon
 •  Leaves
contain
a
few
rules
traversed
linearly
 •  Interior
nodes
may
contain
rules
that
match
also
 •  Tests
may
look
at
bits
from
mulJple
fields
 •  A
rule
may
appear
in
mulJple
nodes
of
the
decision
tree
 –
this
can
lead
to
increased
memory
usage
 •  Tree
built
using
heurisJcs
that
pick
fields
to
compare
on
 that
divide
remaining
rules
relaJvely
evenly
among
 descendants
 •  Fast
and
compact
on
rule
sets
used
today
 31 CS 636 Internetworking HiCuts,
HyperCuts
 Dest
port
<
50
 Source
=
S
?
 DestPort
=
53
?
 Dest
Port
=
53
?

 R1 R3 R5 R6 R10 R2 R7 R2 R7 R9 R5 R4 32 CS 636 Internetworking Interconnects
 Two
basic
techniques
 Input Queueing Output Queueing Usually a non-blocking switch fabric (e.g. crossbar) Usually a fast bus 35 CS 636 Internetworking Karol’s
result:
intuiJve
proof
 •  Assume
saturaJon
(i.e.,
all
inputs
have
cells
to
send
at
any
 given
instant)
 •  Assume
each
packet
desJned
to
each
output
with
probability
 1/N
 •  Equal
size
packets,
probability
that
an
output
O
is
idle
is
 probability
that
none
of
the
inputs
choose
O
 •  Each
input
does
not
choose
O
with
probability
1
–
1/N.
P
(O
 idle)
=
(1‐1/N)N
 –  Converges
to
(1‐1/e)
~
0.63
 •  Careful
analysis
that
avoids
the
independence
assumpJon
 across
rounds
by
Karol
shows
throughput
converges
to
2‐√2
~
 0.58
 36 CS 636 Internetworking Head of Line Blocking If more than one input has a packet destined to the same output, head of line blocking occurs. Wastes bandwidth significantly Input
Queueing
 Scheduling
 Request Graph 1 2 3 4 1 2 3 4 2
 5
 2
 4
 2
 Bipartite Matching 1 2 3 4 1 2 3 4 (Weight = 18) Question: Maximum weight or maximum size? 7 40 CS 636 Internetworking Input
Queueing
 Scheduling
 •  Maximum
Size
 – Maximizes
instantaneous
throughput
 – Does
it
maximize
long‐term
throughput?
 –  Is
it
stable
for
all
arrivals
?

 •  Maximum
Weight
 – Can
clear
most
backlogged
queues
 – But
does
it
sacrifice
long‐term
throughput?
 –  Is
it
stable
for
all
arrivals
?

 41 CS 636 Internetworking Maximum
Size
Matching
(MSM)
 •  MSM
maximizes
instantaneous
throughput
 •  MSM
algorithm:
among
all
size
matchings,
pick
the
 maximum
size
 •  If
mulJple
pick
any
at
random.
 •  Stable
for
uniform
arrivals
 Request Graph Bipartite Match Maximum Size matching Q11(n) QN1(n) 42 CS 636 Internetworking Maximum
weight
matching
 Longest
Queue
First
or
Oldest
Cell
First
 Weight Waiting Time 100% Queue Length { } = 1 2 3 4 1 2 3 4 10
 1
 1
 10
 1 1
 1 2 3 4 1 2 3 4 45 CS 636 Internetworking LQF
(Longest
Queue
First)
 •  LQF
is
the
name
given
to
the
maximum
weight
matching,
 where
weight
wij(n) = Lij(n).

 •  LQF
doesn’t
necessarily
serve
the
longest
queue.
 •  LQF
can
leave
a
short
queue
unserved
indefinitely.
 •  However,
MWM‐LQF
is
very
important
theoreJcally:
most
(if
 not
all)

scheduling
algorithms
that
provide
100%
throughput
 for
unknown
traffic
matrices
are
variants
of
MWM!
 46 CS 636 Internetworking Complexity
of
Maximum
Matchings
 •  Maximum
Size
Matchings:
 –  Typical
complexity
O(N0.5
M)
or
O(N2.5)
 –  Finding
maximum
flow
through
a
network
flow
graph
 •  Maximum
Weight
Matchings:
 –  Typical
complexity
O(N3)
 –  Algorithm
by
Kuhn
 •  In
general:
 –  Hard
to
implement
in
hardware
 –  Slooooow.
 •  Can
we
find
a
faster
algorithm?
 47 CS 636 Internetworking iSLIP

[McKeown
et
al.,
1993]
 1 2 3 4 1 2 3 4 1: Requests 1 2 3 4 1 2 3 4 3: Accept/Match 1 2 3 4 1 2 3 4 #1 #2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 2: Grant 1 2 3 4 50 CS 636 Internetworking iSLIP
OperaJon
 •  Grant
phase:
Each
output
selects
the
 requesJng
input
at
the
pointer,
or
the
next
 input
in
round‐robin
order.
It
only
updates
 its
pointer
if
the
grant
is
accepted.
 •  Accept
phase:
Each
input
selects
the
 granJng
output
at
the
pointer,
or
the
next
 output
in
round‐robin
order.

 •  Consequence:
Under
high
load,
grant
 pointers
tend
to
move
to
unique
values.
 51 CS 636 Internetworking Maximal
Matches
 •  Maximal
matching
algorithms
are
widely
used
 in
industry
(especially
algorithms
based
on
 WFA
and
iSLIP).
 •  PIM
and
iSLIP
are
rarely
run
to
compleJon
(i.e.
 they
are
sub‐maximal).
 •  We
will
see
that
a
maximal
match
with
a
 speedup
of
2
is
stable
for
non‐uniform
traffic.
 52 CS 636 Internetworking ClOQ emulating OQ switch 4 21 1 1, ly 1 40 ' ~ a} (eyo {ett a 42 3 32 3, 3 43210 T0 ¢ Emulation: Apply the same inputs, cell- by-cell to both switches and order of cells should match. Key
concept:
Urgency
 •  Urgency
=
departure
Jme
–
current
Jme
 •  Algorithm
:
most
urgent
cell
first
(MUCF)
 •  In
each
phase,
 –  Outputs
get
most
urgent
cells
first
from
inputs
 –  Inputs
grant
to
outputs
whose
cell
is
most
urgent
 •  Ties
are
broken
based
on
port
number
 –  Loser
outputs
try
to
obtain
next
urgent
cell
 –  No
more
matchings
possible,
cells
are
transferred
 56 CS 636 Internetworking CS 636 Internetworking Packet
Buffers
 57 CS 636 Internetworking Works
fine
if
there
is
only
one
FIFO

 Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79 Bytes: 0-39 … … … … … 280-319 320B Buffer Memory 320B 40B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B 40B 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 60 CS 636 Internetworking Arriving Packets R Unpredictable Scheduler Requests Departing Packets R 1 2 1 Q 2 1 2 3 4 3 4 5 1 2 3 4 5 6 Small head SRAM cache for FIFO heads SRAM Hybrid
Memory
Hierarchy
 Large DRAM memory holds the body of FIFOs 5 7 6 8 10 9 7 9 8 10 11 12 14 13 15 50 52 51 53 54 86 88 87 89 91 90 82 84 83 85 86 92 94 93 95 6 8 7 9 11 10 1 Q 2 Writing b bytes Reading b bytes cache for FIFO tails 55 56 96 97 87 88 57 58 59 60 89 90 91 1 Q 2 Small tail SRAM DRAM 61 CS 636 Internetworking Theorem
[IKM08]
 ImpaJent
Arbiter:
An
SRAM
cache
of
size
Qb(2 + ln Q)
bytes
is
sufficient
to
guarantee
a
byte
is
 always
available
when
requested.
Algorithm
is
 called
MDQF
(Most
Deficit
Queue
first).

 Examples: 1.  40Gb/s linecard, b=640, Q=128: SRAM = 560kBytes 2.  160Gb/s linecard, b=2560, Q=512: SRAM = 10MBytes [IKM08] Designing Packet Buffers for Router Line Cards, In TON 2008 Please see my webpage for the paper. 62 CS 636 Internetworking CS 636 Internetworking Packet
Scheduling
 65 CS 636 Internetworking The
problems
caused
by
FIFO
queues
 in
routers
 1.  In
order
to
maximize
its
chances
of
success,
a
 source
has
an
incenJve
to
maximize
the
rate
 at
which
it
transmits.

 2.  (Related
to
#1)
When
many
flows
pass
 through
it,
a
FIFO
queue
is
“unfair”
–
it
favors
 the
most
greedy
flow.
 3.  It
is
hard
to
control
the
delay
of
packets
 through
a
network
of
FIFO
queues.
 Fa irn es s D el ay G ua ra nt ee s 66 CS 636 Internetworking Max‐Min
Fairness
 A
common
way
to
allocate
flows
 N
flows
share
a
link
of
rate
C.
Flow
f
wishes
to
 send
at
rate
W(f),
and
is
allocated
rate
R(f). 1.  Pick
the
flow,
f,
with
the
smallest
requested
 rate.
 2.  If
W(f)<C/N,
then
set
R(f) = W(f). 3.  If
W(f) >C/N,
then
set
R(f) = C/N.
 4.  Set
N = N – 1. C = C – R(f). 5.  If
N>0
goto
1.
 67 CS 636 Internetworking Deficit
Round
Robin
 •  Provides
excellent
bandwidth
guarantees
 •  One
major
problem:
 – Poor
delay
bounds
 •  ImplementaJon
complexity
 – Need
to
skip
a
lot
of
queues
to
find
next
acJve
 queue
 – We
can
use
an
acJve
list
for
maintaining
this

 – However,
it
can
lead
to
inacJve
queues
not
 accumulaJng
their
fair
share.
 70 CS 636 Internetworking How
to
provide
delay
guarantees
?
 •  Fair
queuing
has
good
delay
bounds
 •  MDRR
tries
to
provide
some
delay
guarantees,
but
is
 an
ad
hoc
soluJon
 •  Classic
way
to
provide
delay
bounds
is
to
use
earliest
 deadline
first
(EDF)
algorithm
 –  Schedule
the
packet
with
earliest
deadline
 –  ImplementaJon
using
virtual
clock
[Zha91]
 71 CS 636 Internetworking Virtual
clock
 •  Short
term
unfairness.

 – Since
flow
2
was
not
using
the
bandwidth
 between
0
and
100,
it
gets
to
use
up
a
lot
of
 short‐term
bandwidth
 1
 100
 1
 100
 Flow 1 R = 0.5 Flow 2 R = 0.5 Deadline = 200 Deadline = 200 Deadline = 2 72 CS 636 Internetworking Scalable
fair
queuing
 •  AggregaJon
 –  IP
lookups
scaled
by
using
up
only
150,000
 prefixes
for
over
100
Million
nodes
 – Apply
aggregaJon
in
the
context
of
fair
 queuing
 – Focus
on
scheduling
aggregates
instead
of
 individual
flows

 •  Random
aggregaJon
 •  Edge
aggregaJon
 75 CS 636 Internetworking StochasJc
fair
queuing
[McKenney]
 •  Routers
keep
state
for
a
fixed
amount,
say
100,000
 flows
on
which
they
do
DRR
 •  A
packet
can
then
be
hashed
based
on
its
header
 fields
to
map
to
one
of
several
queues
 •  MulJple
flows
map
to
a
given
flow
 –  200,000
flows

~
2
flows
share
same
class
 •  Problems:
 –  Flows
compete
with
different
flows
at
different
routers
 –  No
explicit
differenJaJon
between
flows
 76 CS 636 Internetworking Edge
aggregaJon
via
Diffserv
 •  DifferenJated
services
(Diffserv)
also
 aggregates
flows
into
classes
 •  Edge
routers
mark
packet
class
by
using
a
 standardized
value
in
the
IP
TOS
field.
 •  Expedited
service
 – Certain
bandwidth
reserved
for
this
class
 •  Assured
service
 – Lower
drop
rate
for
RED
in
output
queues
 77 CS 636 Internetworking AcJve
queue
management
 •  Queue
Management
 –  Drop
as
a
way
to
feedback
to
TCP
sources
 –  Part
of
a
closed‐loop
 •  TradiJonal
Queue
Management

 –  Drop
Tail
 –  Problems
 •  AcJve
Queue
Management
 –  RED
 –  CHOKe
 –  AFD
 80 CS 636 Internetworking Random
Early
DetecJon
(RED)
 yes Drop the new packet end Admit packet with a probability p end AvgQsize > Maxth? yes Arriving packet no Admit the new packet end AvgQsize > Minth? no 81 CS 636 Internetworking Extending
RED
for
Flow
IsolaJon
 •  Problem:
what
to
do
with
non‐cooperaJve
 flows?
 •  Fair
queuing
achieves
isolaJon
using
per‐flow
 state
–
expensive
at
backbone
routers
 – How
can
we
isolate
unresponsive
flows
without
 per‐flow
state?
 •  RED
penalty
box
 – Monitor
history
for
packet
drops,
idenJfy
flows
 that
use
disproporJonate
bandwidth
 –  Isolate
and
punish
those
flows
 82 CS 636 Internetworking CHOose
and
Keep
for
Responsive
flows
 yes Drop the new packet end Admit packet with a probability p end AvgQsize > Maxth? yes Arriving packet no Admit the new packet end AvgQsize > Minth? no yes no Drop both matched packets end Draw a packet at random from queue Flow id same as the new packet id ? yes Drop the new packet end Admit packet with a probability p end no AvgQsize > Maxth? no 85 CS 636 Internetworking Traffic
Shaping
and
Policing
 •  Can
we
add
bandwidth
guarantees
for
flows
 that
are
placed
in
the
common
queues
 without
segregaJon
?
 – E.g.,
an
ISP
wants
to
restrict
NEWS
traffic
to
 1Mbps
 – UDP
traffic
restricted
to
some
value.
 •  Token
bucket
policing/shaping
 – Uses
a
single
queue
 – One
counter
per
flow
 86 CS 636 Internetworking How
the
user/flow
can
conform
to
 the
(σ,ρ)
regulaJon
 Leaky
bucket
as
a
“shaper”
 Tokens at rate,ρ Token bucket sizeσ Variable bit-rate compression To network time bytes time bytes time bytes ρ C 87 CS 636 Internetworking
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved