HOWTO Packet Shaping

From Gentoo Linux Wiki

Jump to: navigation, search
This article is part of the HOWTO series.
Installation Kernel & Hardware Networks Portage Software System X Server Gaming Non-x86 Emulators Misc

Topology

Contents

[edit] Introduction

Peer to peer programs are getting even more popular as we speak. The use of such programs saturates an internet connection, and eliminates other traffic on the connection. When a user saturates an internet connection, browsing webpages is a pain. Programs such as bittorrent and other P2P clients usually saturates a connection's upload, and even though clients can limit their upload, they don't always do so. The solution is traffic shaping.

[edit] The Solution

First of all, you need to understand something about queueing. This explanation is somewhat simplified, but you'll get an idea about what's happening. Our scenario is an internet connection of 2048/128, a linux box with two ethernet cards and a network with several clients on it.

  • The DSL-modem has the internal ip 192.168.2.1
  • The linux box has ips: eth0 192.168.1.1, eth1 192.168.2.2 and 192.168.2.1 as gateway
  • The clients have ips: 192.168.1.16, 192.168.1.17, 192.168.1.18 and 192.168.1.1 as gateway

This is what happens when a client sends a packet to the internet:

  • The packet leaves the client (192.168.1.16)
  • The packet arrives at the gateway eth0 (192.168.1.1)
  • The gateway sends the packet through and into the output queue
  • The packet leaves the gateway eth1 (192.168.2.2)
  • The packet arrives at the DSL-modem (192.168.2.1)
  • The packet enters the DSL-modems output queue
  • The packet leaves the DSL-modem

Important to understand here is the difference between latency and bandwidth. Latency is a measure of the time that it takes a packet to get from point A to point B. Bandwidth measures the amount of data that got from A to B in a certain time. So, if I were to take a dictionary to my friend on the other side of town, my bandwidth would be good, but the latency would be bad (the time spent driving, to be exact). However, if I were to phone my friend and start reading the dictionary to him, the latency would be lower, but the bandwidth would be substantially less than in the first example. Also important to note is that bandwidth and latency are not directly connected. If it took me the same time to read the dictionary on the phone and drive it over, then the bandwidth in both cases would be equal. However, the latency will not change!

Back to the DSL-modem example. As defined above, the upload speed of the DSL-modem is only 128kbit. This means that the modem will only send 128kbits of data per second (bandwidth). If the gateway sends more data than that, the packets are sent to an output queue to wait in line for their turn to be sent, creating a backlog. What happens next is that the DSL modem's output queue is filled. If it takes a packet 5 seconds to get from the bottom of the queue to the top, we have a latency of 5 seconds. That's bad for interactive sessions.

Since we have no control over how the DSL-modem works, we need to move the speed-limiting queue from the DSL-modem to the linux box. By lowering the output speed of eth1 on the linux box to a speed slightly lower than the upload speed of the DSL-modem, the packets will be queued in the Linux box before being shipped off to the DSL-modem, which would now have an empty output queue, so it should (hopefully) immediately transmit them to the Internet.

Once the queue has been moved to the linux box, we have control over it, and the ability to shape outbound traffic.

[edit] Requirements

Well you don't need much. A computer running gentoo/linux, which has two network interfaces should suffice. I have been running this on a 200 MHz machine, and I believe even smaller machines can handle it.

[edit] Kernel

Warning: (Not optimal) Most of the mentioned requirements are not actually required and are based on preferences and custom needs.

First get the latest 2.4 or 2.6 kernel and put it into /usr/src Then make the link /usr/src/linux point to it.

Next for a 2.4 kernel you must get the POM-patches from http://netfilter.org/ and patch the kernel. The password is: cvs. (Access via cvs doesn't work at the moment...)

 cvs -d :pserver:cvs@pserver.netfilter.org:/cvspublic login
 cvs -d :pserver:cvs@pserver.netfilter.org:/cvspublic co netfilter/userspace netfilter/patch-o-matic
 ./netfilter/patch-o-magic/runme extra

When patching is done you must enable some options in your kernel. If the options doesn't exist, run the POM-patch once again.

Linux Kernel Configuration: Kernel 2.4
Networking options  --->
  QoS and/or fair queueing  --->
    [*] QoS and/or fair queueing
    <M>   HTB packet scheduler
    <M>   SFQ queue
    [*]   QoS support
    [*]     Rate estimator
    [*]   Packet classifier API
    <M> Firewall based classifier
    [*] Traffic policing (needed for in/egress)
  IP: Netfilter Configuration  --->
    <M> Connection tracking (required for masq/NAT)
    <M> IP tables support (required for filtering/masq/NAT)
    <M>   limit match support
    <M>   MAC address match support
    <M>   Packet type match support
    <M>   netfilter MARK match support
    <M>   Multiple port match support
    <M>   TOS match support
    <M>   random match support
    <M>   recent match support
    <M>   ECN match support
    <M>   DSCP match support
    <M>   AH/ESP match support
    <M>   LENGTH match support
    <M>   TTL match support
    <M>   tcpmss match support
    <M>   Helper match support
    <M>   Connection state match support
    <M>   Connection mark match support
    <M>   Connection tracking match support
    <M>   Unclean match support (EXPERIMENTAL)
    <M>   Owner match support (EXPERIMENTAL)
    <M>   Packet filtering
    <M>     REJECT target support
    <M>     MIRROR target support (EXPERIMENTAL)
    <M>   Full NAT
    <M>     MASQUERADE target support
    <M>     REDIRECT target support
    <M>     Basic SNMP-ALG support (EXPERIMENTAL)
    <M>   Packet mangling
    <M>     TOS target support
    <M>     ECN target support
    <M>     DSCP target support
    <M>     MARK target support
    <M>   LOG target support
    <M>   CONNMARK target support
    <M>   ULOG target support
    <M>   TCPMSS target support
    <M> ARP tables support
    <M>   ARP packet filtering
    <M>   ARP payload mangling


Linux Kernel Configuration: Kernel 2.6 (Ex. gentoo-sources 2.6.11-gentoo-r6)
Device Drivers  --->
  Networking support  --->
    Networking options  --->
      QoS and/or fair queueing  --->
        <M>   HTB packet scheduler
        <M>   SFQ queue
        [*]   QoS support
        [*]     Rate estimator
        [*]   Packet classifier API
        <M> Firewall based classifier
        [*] Traffic policing (needed for in/egress)
      [*] Network packet filtering (replaces ipchains)  --->
        IP: Netfilter Configuration  --->
          <*> Connection tracking (required for masq/NAT)
          <*> Userspace queueing via NETLINK
          <*> IP tables support (required for filtering/masq/NAT)
          <*>   limit match support
          <*>   IP range match support
          <*>   MAC address match support
          <*>   Packet type match support
          <*>   netfilter MARK match support
          <*>   Multiple port match support
          <*>   TOS match support
          <*>   recent match support
          <*>   ECN match support
          <*>   DSCP match support
          <*>   AH/ESP match support
          <*>   LENGTH match support
          <*>   TTL match support
          <*>   tcpmss match support
          <*>   Helper match support
          <*>   Connection state match support
          <*>   Connection tracking match support
          <*>   Owner match support
          <*>   Packet filtering
          <*>     REJECT target support
          <*>   LOG target support
          <*>   ULOG target support
          <*>   TCPMSS target support
          <*>   Full NAT
          <*>     MASQUERADE target support
          <*>     REDIRECT target support
          <*>     NETMAP target support
          <*>     SAME target support
          <*>   Packet mangling
          <*>     TOS target support
          <*>     ECN target support
          <*>     DSCP target support
          <*>     MARK target support
          <*>     CLASSIFY target support
          <M>   raw table support (required for NOTRACK/TRACE)
          <M>     NOTRACK target support
          <*> ARP tables support
          <*>   ARP packet filtering
          <*>   ARP payload mangling


Settings arranged a bit different. Here's how to enable it.

Linux Kernel Configuration: Kernel 2.6.14 (and above)
Networking  --->
  Networking options  --->
    [*] Network packet filtering (replaces ipchains)  --->
      IP: Netfilter Configuration  --->
        <*> Connection tracking (required for masq/NAT)
        <*> Userspace queueing via NETLINK
        <*> IP tables support (required for filtering/masq/NAT)
        <*>   limit match support
        <*>   IP range match support
        <*>   MAC address match support
        <*>   Packet type match support
        <*>   netfilter MARK match support
        <*>   Multiple port match support
        <*>   TOS match support
        <*>   recent match support
        <*>   ECN match support
        <*>   DSCP match support
        <*>   AH/ESP match support
        <*>   LENGTH match support
        <*>   TTL match support
        <*>   tcpmss match support
        <*>   Helper match support
        <*>   Connection state match support
        <*>   Connection tracking match support
        <*>   Owner match support
        <*>   Packet filtering
        <*>     REJECT target support
        <*>   LOG target support
        <*>   ULOG target support
        <*>   TCPMSS target support
        <*>   Full NAT
        <*>     MASQUERADE target support
        <*>     REDIRECT target support
        <*>     NETMAP target support
        <*>     SAME target support
        <*>   Packet mangling
        <*>     TOS target support
        <*>     ECN target support
        <*>     DSCP target support
        <*>     MARK target support
        <*>     CLASSIFY target support
        <M>   raw table support (required for NOTRACK/TRACE)
        <M>     NOTRACK target support
        <*> ARP tables support
        <*>   ARP packet filtering
        <*>   ARP payload mangling
    QoS and/or fair queueing  --->
      <M>   HTB packet scheduler
      <M>   SFQ queue
      [*]   QoS support
      [*]     Rate estimator
      [*]   Packet classifier API
    <M> Firewall based classifier
    [*] Traffic policing (needed for in/egress)

Compile your kernel, install it and boot it.

[edit] Shaping strategy

When a packet comes in you can make iptables do several things. Here are some examples of what you can shape by:

  • Port
  • Packet size
  • Traffic type
  • User

Lets say you want to give bittorrent a lower priority. You know it uses ports 6881 through 6889. Iptables can easily track this, but as soon as a user finds out these ports have a low priority, he configures bittorrent to use another port. There is NO WAY to know what ports P2P programs are running on.

Shaping by packet size does have some advantages. You can give smaller packets higher priority than larger packets. Since sending lots of data is best accomplished by sending large packets, this is exactly what P2P programs do. But then again a client could change the MTU on his outgoing interface, thereby sending only small packets.

The thing you really want is to be able to recognize a packet by its contents. You need to make iptables look into every packet and analyze the contents to figure out if its from a P2P program or not. As of now I know of two projects that do this.

Both of these are good projects and I use ipp2p myself. I've only tried it on bittorrent, but it works nicely. Check the last section for these or follow the links.

Shaping by user can be interesting for small home networks. It allows you to give every user a fair share of available bandwidth which they can then use for whatever they like, removing the case where one user takes all bandwidth away from other users altogether.

The various approaches can also be combined using a classful scheduler like HTB. It allows you to first shape by user and then prioritize by traffic type for every user.

[edit] Prioritizing

In this howto we'll create four priorities:

  1. Interactive
  2. Misc
  3. Browsing
  4. P2P
  • Interactive is for small packets that require very small latencies. This could for instance be icmp or ssh. (does this include TCP packet reception acknowledgement packets? [sure it does --newman)
  • Misc is for packets that fit nowhere else.
  • Browsing is for packets that should have smaller latencies than P2P, but shouldn't really take priority over ssh. This could for instance be http or smtp.
  • P2P are for P2P programs and programs that try to upload a lot of data. These have the lowest priority possible.

Note that, for example, giving P2P a lower priority than Browsing does not mean that P2P will get less bandwidth. It means that the system would prefer to transmit Browsing packets over P2P packets. Only if you are saturating your upload stream P2P will take a hit in available bandwidth.

You might want to think about how you use your network and define the priorities accordingly, or change the order, but this example is a good starting point.

[edit] Iptables

It should come as no surprise that you need iptables:

 emerge net-firewall/iptables

We will use iptables to mark packets for shaping later on. However, we first should set up a basic NAT router. This setup is NOT secure at all, it is merely an example showing how to set up NAT:

File: insecure_firewall.sh
# Constants
LOCALNET="192.168.1.0/255.255.255.0"

# Setting policy (the default policy is ACCEPT so you don't really need
# this section unless you set the default policy to DROP; that policy is
# NOT recommended for other chains but the INPUT and FORWARD chains
# in the filter table, and SOMETIMES in the OUTPUT)
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -t nat -P POSTROUTING ACCEPT
iptables -t nat -P PREROUTING ACCEPT

# Flushing all tables
iptables -t filter -F
iptables -t mangle -F
iptables -t nat    -F
iptables -t raw    -F # (optional)

# Masquerading
iptables -t nat -A POSTROUTING -s $LOCALNET -o eth1 -j MASQUERADE
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -d $LOCALNET -j ACCEPT

# Enable kernel forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward


Again, you really should not be using the above script. It is included only for completeness. You should probably read Linux 2.4 Stateful Firewall design(good for 2.6 kernel's too) to aid you in creating a properly secured firewall. Shorewall is a package that configures iptables for you, and you should use that (or something like it) if you don't want to get knee-deep in iptable's syntax.

Next is marking packets with priorities:

File: marking_packets.sh
MARKPRIO1="1"
MARKPRIO2="2"
MARKPRIO3="3"
MARKPRIO4="4"

# Setting priority marks

# Prio 1
# icmp
iptables -t mangle -A FORWARD -p icmp -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p icmp -j MARK --set-mark $MARKPRIO1
# ssh
iptables -t mangle -A FORWARD -p tcp --dport 22 -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p tcp --dport 22 -j MARK --set-mark $MARKPRIO1
# non tcp
iptables -t mangle -A FORWARD -p ! tcp -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p ! tcp -j MARK --set-mark $MARKPRIO1

# Prio 2

# Prio 3
# http
iptables -t mangle -A FORWARD -p tcp --dport 80 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 80 -j MARK --set-mark $MARKPRIO3
# https
iptables -t mangle -A FORWARD -p tcp --dport 443 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 443 -j MARK --set-mark $MARKPRIO3
# smtp
iptables -t mangle -A FORWARD -p tcp --dport 25 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 25 -j MARK --set-mark $MARKPRIO3

# Prio 4
# packets > 1024 bytes
iptables -t mangle -A FORWARD -p tcp -m length --length 1024: -j MARK --set-mark $MARKPRIO4
# bittorrent
iptables -t mangle -A FORWARD -i eth0 -p tcp --sport 6881:6889 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A FORWARD -i eth0 -p tcp --dport 6881:6889 -j MARK --set-mark $MARKPRIO4

# Remaining packets are marked according to TOS
iptables -t mangle -A FORWARD -p tcp -m tos --tos Minimize-Delay -m mark --mark 0 -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A FORWARD -p tcp -m tos --tos Maximize-Throughput -m mark --mark 0 -j MARK --set-mark $MARKPRIO2
iptables -t mangle -A FORWARD -p tcp -m tos --tos Minimize-Cost -m mark --mark 0 -j MARK --set-mark $MARKPRIO4

Explanation and notes:

  • -t mangle: We want to mangle (change) packets, by marking them.
  • -A FORWARD/OUTPUT: The rule-chains packets are travelling through. In OUTPUT are packets coming out of this machine, and in FORWARD are packets that we are sending for other machines
  • -p icmp: icmp packets only (same for tcp etc)
  • -p ! tcp: packets that aren't tcp only. Note that icmp isn't tcp, yet we have it seperatly. This is unneeded, but improves readability
  • -dport 22: Match (tcp, obviously) packets going to port 22 (Destination Port)
  • -j MARK --set-mark $MARKPRIO1: -j indicates the action we want to take, and --set-mark tells iptables what to mark the packet with
  • -m tos --tos ... : Match on the tos of the packet
  • -m mark --mark 0: Match packets that haven't been marked yet
  • The bittorrent example included will probably not work as intended. Beside hardcoding in the interface (eth0), probably to make sure that incoming traffic is not marked, its important to note that bittorrent does not always use ports 6881-6889. See layer7 and ipp2p later on for better ways to solve this issue.

Alternate method: CLASSIFY target

Instead of using the MARK target in the FORWARD or OUTPUT chains, you can use the CLASSIFY target in the POSTROUTING chain. The following is an example of classifying outgoing ssh traffic (port 22) to HTB class 1:101 (high priority as you will see later in this howto):

iptables -t mangle -A POSTROUTING -p tcp --sport 22 -j CLASSIFY --set-class 1:101

For more information on the CLASSIFY target:

Iptables Tutorial: CLASSIFY Target

[edit] Understanding HTB

What is it really HTB does? HTB is a system that divides bandwidth into separate queues. It is important to remember that HTB is made to guarantee bandwidth and NOT to guarantee interactivity. HTB doesn't count packets, it counts bytes! This is why it requires some cleverness to get interactivity out of it. Well, here's a short explanation of some of the inner workings.

[edit] Qdiscs

Qdisc is short for Queue Discipline, meaning a specific strategy used to manage a queue. The queue in the post-office and the queue at an emergency room are both queues in the sense they are both lines of "items", but the strategy (or qdisc) used to manage them is very different.

[edit] Classes

The HTB Qdisc organizes packets into classes, using filters. In our example we will filter using marks. Each class is a queue in its own right, and therefore uses yet another qdisc (SFQ in the example). You can think of classes as the doors from where the bandwidth pass. You must classify the traffic in the correct door that limits that kind of traffic.

[edit] Rates

A rate is the amount of bandwidth a qdisc is guaranteed. For example, in an ideal world, an upload rate of 128kbit/s would mean that the ISP will always give us at least that amount of bandwidth, or more if available.

[edit] Ceil

"Ceil" (Bandwidth Ceiling) is the maximum amount of bandwidth a qdisc can have. Continuing from the previous example, in the real world, an "upload rate" of 128kbit/s really means that the ISP has set a limit on the maximum bandwidth that we can use, even if more is available.

[edit] Bursts

There are two types of bursts - burst, and cburst.

"burst" is an amount of bytes by which a qdisc can exceed its rate, while "cburst" is the same, but is for exceeding the ceil.


on http://lartc.org/manpages/tc-htb.html it is very little explained. it says burst is for how many bytes can be sent after a class hits ceil, in excess of the configured rate. how is this. i mean ceil is higher than rate, so if we are already sending data at speeds higher then rate. am i wrong?

Burst is the amount of bytes a class is allowed to send at rate ceil. Important to remember is that ceil is a rate (bytes/sec) and burst is an amount (bytes). You can look at burst as the size of the bucket, ceil as the maximum at which you can take tokens from the bucket and rate as the speed at which the bucket refills. So if we have a burst of 5000, a rate of 1000 and a ceil of 2500, then we can sustain a connection a 1000 bytes/sec, if at some point a sudden "burst" of data is available it is possible to send at 2500 for (5000/2500=) 2 seconds. After that the bucket is empty and starts filling up at 1000 bytes/sec.

also in the comments you posted, you say you do ok with bursts below half a k. you re the only one, that sais that. a lot of ppl are using htb-tools, and there the use a minimum burst of 2k. they recomend 30% of ceil now.


[edit] Quantum

Quantum describes how bandwidth is divided between qdiscs. It works like this:

  • All quantums for all qdiscs are added together and the sum is remembered
  • Each qdisc gets priority according to <math>\frac{quantum}{sum}</math>

This is used when you have two qdiscs with the same rate and ceil, but want to give them different priority.

While using SFQ, the amount of data is measured in bytes, not in packets, therefore quantum value equals to number of bytes that particular qdisc is allowed to send in one cycle.

It is important to remember that to get the finest precision you should select as small quantum as possible, while still larger than the MTU. When classes want to borrow bandwidth they are each given the number of bytes in quantum before serving other competing class. A too large quantum can create long response times.


(Please provide/fix the above equation. And provide an example. Can we also explain why the values in the example scripts were chosen?)
(If I understand correctly if I have 2 qdiscs (A and B) with quantums (2 and 1) it's like 2:1 that means for every 2 packets of qdisc A allow 1 packet of qdisc B so qdisc A has 66% of the whole bandwidth (2+1=3 and 2/3=0.66...) But because the quantum equals bytes not packets and if your mtu is like 1500 bytes, then the 1 packet in quantum B would be a 1500 and the 2 packets in quantum A is a 3000 )

[edit] r2q

The r2q value (which is 10 by default) can be specified when you create an HTB root. R2q means "rate to quantum" and is the conversion factor used to calculate the "quantum" value using the specified rate for a queue. You can always override this by explicitly specifying a quantum for a class.

The quantum are thus calculated by quantum = rate (in bytes)/r2q. The quantum values must be bigger than the MTU for your setup, which is 1500 for most cases (that is the MTU of ethernet). The quantum values must also be smaller than 60000, a value hard-coded to prevent poor prioritizing. If your quantum values are wrong, you will get error messages in your kern.log like "HTB: quantum of class 10101 is big. Consider r2q change." In such a case, calculate a better r2q value using your rates (set it so that all quantum end up being >1500 but <60000). If the quantum values are wrong, the division of bandwidth will not be proper.

[edit] Prio

If a class has a higher ceil than rate, it is allowed to borrow bandwidth from other classes. By default, the bandwidth priority of a class is weighted by its rate, i.e. a class with twice the rate of another class can also borrow twice as much bandwidth. This behavior can be overridden with the prio parameter. The smaller the number, the higher the priority. Classes with higher priority will borrow first, classes with lower priority can borrow only if there still is bandwidth left. A class that can't borrow bandwidth will not be able to exceed its rate. The prio parameter is often misunderstood as it does not actually affect the order in which packets are sent out.

[edit] Understanding SFQ

SFQ stands for Stochastic Fairness Queueing. It will queue packets that belong to different connections and try to allow every connection to send the same amount of packets, for example to achieve that several concurrent FTP uploads will all run at the same speed and not one choke the other. As such, SFQ does a great work on its own, by balancing "everything" without any additional configuration. The downside to SFQ is that it is a Queue. As such, it delays packets and introduces lag. Attaching several SFQ qdiscs to HTB classes makes this problem worse, as every instance of SFQ keeps its own queue, increasing the total number of packets that will be delayed.

[edit] limit

The default queue size of SFQ is 128 packets large. With the limit parameter (which is only available in newer kernels / tc versions) you can set your own custom queue size. Smaller queue sizes will make the stochastic fairness less accurate, but will improve latency at the same time.

[edit] Iproute2

To set up HTB you need iproute2

emerge sys-apps/iproute2  

Actually you need the programme tc which is included in the iproute2 package.

Run this script to create the four qdiscs and set them up:

File: iptables_quotas.sh
#Constants

# Interface you want to do shaping on
# eth2, eth1 for direct connection; ppp0 or so for dsl
# and other dialup connections (check ifconfig)
IFACE=eth2

# Priority marks
MARKPRIO1="1"
MARKPRIO2="2"
MARKPRIO3="3"
MARKPRIO4="4"

# Rates
UPRATE="152kbit"
#P2PRATE=$UPRATE
P2PRATE="128kbit"
PRIORATE1="65kbit"
PRIORATE2="46kbit"
PRIORATE3="27kbit"
PRIORATE4="8kbit"

# Quantum
QUANTUM1="12187"
QUANTUM2="8625"
QUANTUM3="5062"
QUANTUM4="1500"

# Burst
BURST1="6k"
BURST2="4k"
BURST3="2k"
BURST4="0k"
CBURST1="3k"
CBURST2="2k"
CBURST3="1k"
CBURST4="0k"

# Set queue length for IFACE
ifconfig $IFACE txqueuelen 16

# Specify queue discipline
tc qdisc add dev $IFACE root handle 1:0 htb default 103 r2q 1

# Set root class
tc class add dev $IFACE parent 1:0 classid 1:1 htb rate $UPRATE burst $BURST1 cburst $CBURST1
# Specify sub classes
tc class add dev $IFACE parent 1:1 classid 1:101 htb rate $PRIORATE1 ceil $UPRATE quantum $QUANTUM1 burst $BURST1 cburst $CBURST1 prio 0
tc class add dev $IFACE parent 1:1 classid 1:102 htb rate $PRIORATE2 ceil $UPRATE quantum $QUANTUM2 burst $BURST2 cburst $CBURST2 prio 1
tc class add dev $IFACE parent 1:1 classid 1:103 htb rate $PRIORATE3 ceil $UPRATE quantum $QUANTUM3 burst $BURST3 cburst $CBURST3 prio 2
tc class add dev $IFACE parent 1:1 classid 1:104 htb rate $PRIORATE4 ceil $P2PRATE quantum $QUANTUM4 burst $BURST4 cburst $CBURST4 prio 3

# Filter packets
tc filter add dev $IFACE parent 1:0 protocol ip prio 0 handle $MARKPRIO1 fw classid 1:101
tc filter add dev $IFACE parent 1:0 protocol ip prio 1 handle $MARKPRIO2 fw classid 1:102
tc filter add dev $IFACE parent 1:0 protocol ip prio 2 handle $MARKPRIO3 fw classid 1:103
tc filter add dev $IFACE parent 1:0 protocol ip prio 3 handle $MARKPRIO4 fw classid 1:104

# Add queuing disciplines
tc qdisc add dev $IFACE parent 1:101 sfq perturb 16 quantum $QUANTUM1
tc qdisc add dev $IFACE parent 1:102 sfq perturb 16 quantum $QUANTUM2
tc qdisc add dev $IFACE parent 1:103 sfq perturb 16 quantum $QUANTUM3
tc qdisc add dev $IFACE parent 1:104 sfq perturb 16 quantum $QUANTUM4

[edit] l7-filter

L7-filter attempts to be a more general classifier than ipp2p. The pattern definitions are stored in user space so as to be easily modified without kernel recompilation. As mentioned above I haven't tried l7-filter. Anyways, you can install it with

emerge l7-protocols
emerge l7-filter

Once it has been installed, you should learn how to use it and add your experience to this howto ;-)

After installing the l7-filter, you have to do

echo "net-firewall/iptables extensions" >> /etc/portage/package.use
emerge --newuse iptables

to get the shared objects you need to use the l7-filter. You also have to activate the l7 match in your Kernel (e.g. as a module)

Linux Kernel Configuration:
Device Drivers -->
 [*] Network support
   Network options -->
    [*] Network packet filtering -->
      IP: Netfilter configuration -->
       [*]   Connection tracking flow accounting
       <M> FTP protocol support
       <M> Userspace queueing via NETLINKa
       <M> Layer 7 match support (EXPERIMANTAL)

Then you have to recompile your kernel (e.g. with genkernel):

genkernel --no-clean --no-mrproper all

After this, you have to add the rules. But be careful, you have to add the rules so that packets going in both directions pass through an l7-filter rule. This means, it has to be e.g. in INPUT and also in OUTPUT. This is one of the trickiest things you have to solve ;-)

Here one example that should work.

iptables -t filter -A INPUT -m layer7 --l7proto edonkey -j ACCEPT
iptables -t filter -A OUTPUT -m layer7 --l7proto edonkey -j ACCEPT

Or for administrators who don't want to allow p2p at all:

iptables -t filter -A FORWARD -m layer7 --l7proto edonkey -j DROP

This is dangerous, however, because there is the risk of false positives dropping useful traffic. Read l7-filter's protocols page to get an idea of whether you want to risk it or not. It is safer to limit the bandwidth instead. First mark the packets:

iptables -t mangle -A FORWARD -m layer7 --l7proto edonkey -j MARK --set-mark 123

Then use tc to match that mark.

For more information, see http://l7-filter.sourceforge.net

Have much fun.

Notice: l7-filter does not work with hardened-sources 2.6.11-r15.

[edit] Example Script

These scripts are what I use for shaping on my gentoo router. It works well on my home (Comcast cable) connection and currently only limits egress/upload traffic.

The first script creates the HTB/SFQ and Ingress policies:

File: /etc/init.d/qos.htb
#!/bin/bash
#  Created/Hacked together by Rudy Grigar.
#  2008-04-26
#
#     NOTE: This script needs kernel support for
#            SFQ, HTB, and tc from the iproute2 package.
#            This script doesn't mark packets, it only
#            shapes already marked traffic. See qos.iptables
#            for examples of marking traffic.
#
#  SOURCES: http://www.tldp.org/HOWTO/ADSL-Bandwidth-Management-HOWTO/implementation.html
#           http://gentoo-wiki.com/HOWTO_Packet_Shaping
#           http://lartc.org/
#           http://lartc.org/wondershaper/wondershaper-1.1a.tar.gz 

## Device
# This is the interface we want to do shaping on
# (i.e. eth1 is directly connected to my cable modem)
DEV=eth1

## Rates - Set these to match your set up!
# Note: ACTUAL rates almost always differ from advertised rates,
#        test your connection speed and tweak UPRATE and DOWNRATE
#        to your needs.
UPRATE="365kbit"    # This is the maximum ACTUAL upload rate available 
P2PRATE="215kbit"   # This is the maximum arbitrary ceiling used for priority 4 / p2p applications
PRIORATE1="155kbit" # Guarantee 160kbit to prio1 traffic
PRIORATE2="123kbit" # Guarantee 128kbit to prio2 traffic
PRIORATE3="60kbit"  # Guarantee 64kbit to prio3 traffic
PRIORATE4="27kbit"  # Guarantee 32kbit to prio4 traffic

# Note: DOWNRATE is only used if i can figure out a way to set it up 
#        without IMQ, since IMQ isn't in the kernel.  I have seen
#        a few ways to mimic IMQ with a dummy device that I'm still
#        working on adding to the script.
DOWNRATE="8500kbit" # This is the maximum ACTUAL download rate available
# Note: Since I'm too lazy to set up imq/dummy device ingress shaping
#        I will just use DOWNTHROTTLE to limit all downloads at a certain
#        speed. This appears to work fine for my home network, but it is
#        not an ideal solution since ingress UDP and ICMP traffic should 
#        have priority over ingress TCP. DOWNTHROTTLE should be *less* 
#        than actual DOWNRATE (1000kbit less works for me, your mileage
#        may vary).
DOWNTHROTTLE="7500kbit"

## Allow us to view the status of our QoS setup quickly
# /etc/init.d/qos.htb status
if [ "$1" = "status" ]
then
        echo "[qdisc]"
        tc -s qdisc show dev $DEV
        echo "[class]"
        tc -s class show dev $DEV
        echo "[filter]"
        tc -s filter show dev $DEV
        exit
fi

## Reset everything to a known state (cleared)
# Remove previous tc rules
tc qdisc del dev $DEV root	2> /dev/null > /dev/null
tc qdisc del dev $DEV ingress	2> /dev/null > /dev/null

## Exit if asked to stop, otherwise continue
if [ "$1" = "stop" ] 
then 
        echo "HTB/QOS Shaping removed on $DEV."
        exit
fi

# Priority marks -
# just for cleanliness
MARKPRIO1="1"
MARKPRIO2="2"
MARKPRIO3="3"
MARKPRIO4="4"

##############
## Don't mess with this stuff unless you know what you're doing...
## I've tried to explain it a little bit, though :)
##############
# Set queue length for DEV
ifconfig $DEV txqueuelen 512

## Set up the queue
# Note:  For a better explaination of how HTB works 
#        visit http://www.opalsoft.net/qos/DS-28.htm
#
#         .-  UPRATE  -.        - maximum ACTUAL uprate we specify (384k)
#        /    /    \    \     
#     PRIO1 PRIO2 PRIO3 PRIO4   - rates we specified for priorate{1-4}
#      160k  128k   64k   32k   - these are the guaranteed rates 
#     CEIL  CEIL  CEIL  CEIL    - if we aren't maxing out each priority 
#      384k  384k  384k  220k     we can borrow up to the ceil, but as
#                                 soon as a higher priority needs bandwidth
#                                 it will be able to 'steal' it back

# Specify queue discipline (HTB)
# http://www.docum.org/docum.org/faq/cache/10.html has some info on shaping
# rules, but so does google. "default 103" tells the root qdisc that
# unmarked traffic should be placed in the PRIO3 bucket.
tc qdisc add dev $DEV root handle 1: htb default 103

# Set root class
# Note:  This sets the top/root of the queue tree
tc class add dev $DEV parent 1: classid 1:1 htb rate $UPRATE ceil $UPRATE burst 8k
# Specify sub classes
# Note:  These are the prio{1-4} nodes from the diagram above
tc class add dev $DEV parent 1:1 classid 1:101 htb rate $PRIORATE1 ceil $UPRATE burst 2k prio 0
tc class add dev $DEV parent 1:1 classid 1:102 htb rate $PRIORATE2 ceil $UPRATE burst 2k prio 1
tc class add dev $DEV parent 1:1 classid 1:103 htb rate $PRIORATE3 ceil $UPRATE burst 2k prio 2
tc class add dev $DEV parent 1:1 classid 1:104 htb rate $PRIORATE4 ceil $P2PRATE burst 2k prio 3

# Filter packets
# Note:  This puts the packets in the proper priority class
tc filter add dev $DEV parent 1: protocol ip prio 0 handle $MARKPRIO1 fw classid 1:101
tc filter add dev $DEV parent 1: protocol ip prio 1 handle $MARKPRIO2 fw classid 1:102
tc filter add dev $DEV parent 1: protocol ip prio 2 handle $MARKPRIO3 fw classid 1:103
tc filter add dev $DEV parent 1: protocol ip prio 3 handle $MARKPRIO4 fw classid 1:104

# Add queuing disciplines
tc qdisc add dev $DEV parent 1:101 handle 101: sfq 
tc qdisc add dev $DEV parent 1:102 handle 102: sfq
tc qdisc add dev $DEV parent 1:103 handle 103: sfq
tc qdisc add dev $DEV parent 1:104 handle 104: sfq

####
# Ingress shaping, not sure how well this actually works yet..
####

# Example iptables rule:
#  iptables -A PREROUTING -i $DEV -t mangle -p tcp --sport 80 -j MARK --set-mark 1
tc qdisc add dev $DEV handle ffff: ingress
# Match all traffic...
tc filter add dev $DEV parent ffff: protocol ip prio 5 u32 match ip src 0.0.0.0/0 police rate $DOWNTHROTTLE burst 32k drop flowid :1

echo "Outbound shaping added to $DEV.  Rate: ${UPRATE}/sec."
echo "                             P2P Rate: ${P2PRATE}/sec."
echo "                           PRIO1 Rate: ${PRIORATE1}/sec."
echo "                           PRIO2 Rate: ${PRIORATE2}/sec."
echo "                           PRIO3 Rate: ${PRIORATE3}/sec."
echo "                           PRIO4 Rate: ${PRIORATE4}/sec."

Next are the iptables rules used to mark packets a certain priority:

File: /etc/init.d/qos.iptables
#!/bin/bash
#  Created/Hacked together by Rudy Grigar.
#  2008-04-26
#
#     NOTE: This script only marks packets for queues (using iptables).
#            TC, SFQ, and HTB are all needed to shape the
#            marked packets. The U32 classifier from TC could also
#            be used to mark packets, but it's overly complex and iptables works fine.
#
#  SOURCES: http://www.tldp.org/HOWTO/ADSL-Bandwidth-Management-HOWTO/implementation.html
#           http://gentoo-wiki.com/HOWTO_Packet_Shaping
#           http://lartc.org/
#           http://lartc.org/wondershaper/wondershaper-1.1a.tar.gz 

## Device
# This is the interface we want to do shaping on
# (i.e. eth1 is directly connected to my cable modem)
DEV=eth1

## Allow us to view the status of our QoS setup quickly
# /etc/init.d/qos.iptables status
if [ "$1" = "status" ]
then
        echo "[iptables]"
        iptables -t mangle -L -v -x 2> /dev/null
        exit
fi

# Delete the mangle iptables rules
iptables -t mangle -F 2> /dev/null > /dev/null

## Exit if asked to stop, otherwise continue
if [ "$1" = "stop" ] 
then 
        echo "QoS iptables marking removed on $DEV."
        exit
fi

# Priority marks -
# just for cleanliness
MARKPRIO1="1"
MARKPRIO2="2"
MARKPRIO3="3"
MARKPRIO4="4"

## Setting priority marks with iptables
## Prio 1
# icmp
iptables -t mangle -A FORWARD -p icmp -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p icmp -j MARK --set-mark $MARKPRIO1
# ssh
iptables -t mangle -A INPUT -p tcp --dport 22 -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A FORWARD -p tcp --dport 22 -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p tcp --dport 22 -j MARK --set-mark $MARKPRIO1
# non tcp (this means games that use UDP will always be prio1)
iptables -t mangle -A FORWARD -p ! tcp -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A OUTPUT -p ! tcp -j MARK --set-mark $MARKPRIO1
### - End Priority 1


## Prio 2 - GAMES
# CS:S (appears to only care about udp traffic)
# WoW
iptables -t mangle -A FORWARD -p tcp --sport 3724 -j MARK --set-mark $MARKPRIO2
iptables -t mangle -A FORWARD -p tcp --dport 3724 -j MARK --set-mark $MARKPRIO2
# Warcraft III 
iptables -t mangle -A FORWARD -p tcp --dport 6112 -j MARK --set-mark $MARKPRIO2
iptables -t mangle -A FORWARD -p tcp --sport 6112 -j MARK --set-mark $MARKPRIO2
# note: this is a nonstandard wc3 port (used for hosting with more than 1 box)
iptables -t mangle -A FORWARD -p tcp --dport 6119 -j MARK --set-mark $MARKPRIO2
iptables -t mangle -A FORWARD -p tcp --sport 6119 -j MARK --set-mark $MARKPRIO2
### - End Priority 2


## Prio 3
# http
iptables -t mangle -A FORWARD -p tcp --dport 80 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 80 -j MARK --set-mark $MARKPRIO3
# https
iptables -t mangle -A FORWARD -p tcp --dport 443 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 443 -j MARK --set-mark $MARKPRIO3
# smtp
iptables -t mangle -A FORWARD -p tcp --dport 25 -j MARK --set-mark $MARKPRIO3
iptables -t mangle -A OUTPUT -p tcp --dport 25 -j MARK --set-mark $MARKPRIO3
### - End Priority 3


## Prio 4
# packets > 1024 bytes
iptables -t mangle -A FORWARD -p tcp -m length --length 1024: -j MARK --set-mark $MARKPRIO4
# bittorrent - defaults
# these also double as the blizzard downloader ports
iptables -t mangle -A FORWARD -p tcp --sport 6881:6889 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A OUTPUT -p tcp --sport 6881:6889 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A FORWARD -p tcp --dport 6881:6889 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A OUTPUT -p tcp --dport 6881:6889 -j MARK --set-mark $MARKPRIO4

# bittorrent - network specific
# these are the ports used for bittorrent on MY network, unless you use the exact same
# ones, i recommend you change them.
iptables -t mangle -A FORWARD -p tcp --dport 53331 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A FORWARD -p tcp --dport 50002 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A FORWARD -p tcp --sport 53331 -j MARK --set-mark $MARKPRIO4
iptables -t mangle -A FORWARD -p tcp --sport 50002 -j MARK --set-mark $MARKPRIO4
# these are the bt ports used on the router (hence the INPUT chain)
iptables -t mangle -A INPUT -p tcp --dport 53341:53351 -j MARK --set-mark $MARKPRIO4
### - End Priority 4


## Remaining packets are marked according to TOS
iptables -t mangle -A FORWARD -p tcp -m tos --tos Minimize-Delay -m mark --mark 0 -j MARK --set-mark $MARKPRIO1
iptables -t mangle -A FORWARD -p tcp -m tos --tos Maximize-Throughput -m mark --mark 0 -j MARK --set-mark $MARKPRIO2
iptables -t mangle -A FORWARD -p tcp -m tos --tos Minimize-Cost -m mark --mark 0 -j MARK --set-mark $MARKPRIO4
### - End TOS

[edit] Testing

[edit] Graphs

I'm done now, but did I make it work? That's indeed a good question. To answer it, I made a perl-script that draws some graphs. I always have it running on my own server, you can go there to watch the graphs and download the script

[edit] Setting the upload rate

One important thing is to set your upload rate right. If you set it too high, the queue moves to the DSL-modem. If you set it too low, you don't utilize your bandwidth. It has to be just right.

To test this

  • Fire up bittorrent
  • Start pinging some server
  • Ssh into a server

Now keep an eye on the ping times as you change the upload rate. Try typing something in your ssh session. I promise, you'll notice when the upload rate is too high.

Finally you can run a test with the game enemy territory. It requires a lot of small packets to be sent in a hurry. Hold down tab to see your latency.

[edit] Links

Are you confused or do you just want to know more? I can understand that. The documentation for QOS is scattered on the net, and some places inaccurate. Learning QOS myself I gathered these links:

Thinking of traffic shaping multiple interfaces as one in kernel 2.6? Or want to packet shape ingress? IMQ could be the answer, but it might not be stable. Or is it? There is next to no online documentation.

Personal tools