Bug #401

major retries in the VI AMPDU queue on ath9k

Added by Dave Täht 11 months ago. Updated 2 months ago.

Status:Closed Start date:07/08/2012
Priority:Urgent Due date:
Assignee:Dave Täht % Done:

0%

Category:Networking Spent time: 20.00 hours
Target version:1st Public Cerowrt release

Description

I find the AMPDU retry statistics to be massively high in the case of VI and still quite high in the case of BE.

(The radios happen to be in MCS4 most of the time, if that matters.
There are only three radios on the network, no competing traffic at all)

root@davesroof:/sys/kernel/debug/ieee80211/phy0/ath9k# cat xmit
    Num-Tx-Queues: 10  tx-queues-setup: 0x10f poll-work-seen: 36206
                                BE         BK        VI        VO

    MPDUs Queued:            27436        174     15066     12730
    MPDUs Completed:         27322        174     15062     12708
    MPDUs XRetried:            114          0         4        22
    Aggregates:              94194          5         0         0
    AMPDUs Queued HW:       620975         49     21536         0
    AMPDUs Queued SW:       438841          7         0         0
    AMPDUs Completed:      1045702         34      6476         0
    AMPDUs Retried:         438217        728    413300         0
    AMPDUs XRetried:         14114         22     15059         0
    FIFO Underrun:               0          0         0         0
    TXOP Exceeded:               0          0         0         0

So I don't see TXOP exceeded, but I do see exorbitant numbers
of retries in the AMPDU VI queue for some reason. This disturbs me.

In xmit.c I see:

  static u32 ath_lookup_rate(struct ath_softc *sc, struct ath_buf *bf,
struct ath_atx_tid *tid)
{
  ...
  /*
* Find the lowest frame length among the rate series that will have a
* 4ms transmit duration.
* TODO - TXOP limit needs to be considered.
*/
max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

Related issues

related to Cerowrt - Bug #379: 3.3.4-3 router crashes under heavy load New
related to Cerowrt - Bug #372: wifi throwing lots of errors New 04/21/2012

History

Updated by Dave Täht 11 months ago

Andrew McGregor: You'd be right to call that dubious
12:24 AM Either the limit has to be the minimum amongst all the queues, or it has to be per queue
me: Perhaps I'm still misunderstanding - at MCS4 what is the max aggregate?
1:57 AM Andrew McGregor: I don't know, it's expressed in units of time
1:58 AM It is prescribed by the queue's parameters...
1:59 AM There's a table here: http://wifi-insider.com/wlan/wmm.htm
2:01 AM So, the way you'd implement that is to query the driver for the transmit time for the AMPDU as you're building it up

Updated by Dave Täht 11 months ago

<nbd> hey dtaht [20:20]
<nbd> i've been looking into that VI
queue txop mess some more
<nbd> looks like i found an
interesting bug (in addition to
the stuff you already uncovered)
[20:21]
<nbd> seems that the configured txop
limit in the hw is off by a
factor 32 ;)
<nbd> if i'm reading this stuff
correctly
<nbd> only affects VI and VO of course
\ [22:00]
<nbd> it would limit the maximum transmission duration for VI to 94 usec and
VO to 47 [22:02]
<nbd> maybe the hw adds some extra time on top of that
<nbd> my patches seem to work ;) [22:09]
<nbd> i didn't test pushing traffic to the VI queue
<nbd> but i tested adjusting the BE queue to the same txop limit
<nbd> committed [22:11]
<nbd> have fun with that, i'm going to get some sleep now [22:12]
<nbd> i'll submit this stuff to linux-wireless@ tomorrow [22:13]

Updated by Dave Täht 11 months ago

  • Priority changed from Normal to Urgent

Well, I see better behavior in the VI queue, using

[1] Done netperf l 60 -Y CS0,CS0 -H 172.20.1.1
[2]
Done netperf -l 60 -Y CS5,CS5 -H 172.20.1.1
[3]+ Done netperf -l 60 -Y EF,EF -H 172.20.1.1

The BK queue does indeed drop packets according to tc, VO, VI weren't

Before I got that far, this:

                            BE         BK        VI        VO

MPDUs Queued:             1308        161      1364     39399
MPDUs Completed:          1308        161      1364     39397
MPDUs XRetried:              0          0         0         2
Aggregates:               2975        962      9326         0
AMPDUs Queued HW:        14134       4331    282818         0
AMPDUs Queued SW:         6773       2314     31996         0
AMPDUs Completed:        19780       6470    313444         0
AMPDUs Retried:          34959       4825     84898         0
AMPDUs XRetried:          1126        175      1370         0
FIFO Underrun:               0          0         0         0
TXOP Exceeded:               0          0         0         0
TXTIMER Expiry:              0          0         0         0
DESC CFG Error:              0          0         0         0
DATA Underrun:               0          0         0         0
DELIM Underrun:              0          0         0         0
TX-Pkts-All:             22214       6806    316178     39399
TX-Bytes-All:          2431000     602745  27746600   3522718
hw-put-tx-buf:               1          1         1         1
hw-tx-start:             53190      10240    390746     39399
hw-tx-proc-desc:         53189      10240    390746     39399
TX-Failed:                   0          0         0         0
txq-memory-address:   8280a1b4   8280a230  8280a138  8280a0bc
axq-qnum:                    2          3         1         0
axq-depth:                   1          0         0         0
axq-ampdu_depth:             1          0         0         0
axq-stopped                  0          0         0         0
tx-in-progress               0          0         0         0
pending-frames               1          0         0         0
txq_headidx:                 0          0         0         0
txq_tailidx:                 0          0         0         0
axq_q empty:                   0          0         0         0
axq_acq empty:                 1          1         1         1
txq_fifo[0] empty:             1          1         1         1
txq_fifo[1] empty:             1          1         1         1
txq_fifo[2] empty:             1          1         1         1
txq_fifo[3] empty:             1          1         1         1
txq_fifo[4] empty:             1          1         1         1
txq_fifo[5] empty:             1          1         1         1
txq_fifo[6] empty:             1          1         1         1
txq_fifo[7] empty:             1          1         1         1

However doing stuff in the reverse direction crashes the router.
This could be an out of memory condition or something else with fq_codel or the driver...

Using netperf 2.6 - from my laptop, connected via mesh via ad-hoc mode, at rates around 120Mbit...

netperf -l 60 -Y EF,EF -H 172.20.1.1 -t TCP_MAERTS &
netperf -l 60 -Y CS5,CS5 -H 172.20.1.1 -t TCP_MAERTS &
netperf -l 60 -Y CS0,CS0 -H 172.20.1.1 -t TCP_MAERTS &

This thoroughly exercises the VO,VI,BE queues (which is not something that happens in real life)

Updated by David Taht 2 months ago

  • Status changed from New to Closed

Also available in: Atom PDF