commit 8f41958bdd577731f7411c9605cfaa9db6766809 Merge: bc06cff... 0909fca... Author: Linus Torvalds Date: Sun Jul 15 16:56:12 2007 -0700 Merge git://git.infradead.org/battery-2.6 * git://git.infradead.org/battery-2.6: git-battery vs git-acpi Power supply class and drivers: remove non obligatory return statements pda_power: clean up irq, timer MAINTAINERS: Add maintainers for power supply subsystem and drivers Fixed up trivial conflict in drivers/w1/slaves/w1_ds2760.c manually commit bc06cffdec85d487c77109dffcd2f285bdc502d3 Merge: d3502d7... 9413d7b... Author: Linus Torvalds Date: Sun Jul 15 16:51:54 2007 -0700 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits) [SCSI] ibmvscsi: convert to use the data buffer accessors [SCSI] dc395x: convert to use the data buffer accessors [SCSI] ncr53c8xx: convert to use the data buffer accessors [SCSI] sym53c8xx: convert to use the data buffer accessors [SCSI] ppa: coding police and printk levels [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c [SCSI] remove the dead CYBERSTORMIII_SCSI option [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA [SCSI] Clean up scsi_add_lun a bit [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs [SCSI] sni_53c710: Cleanup [SCSI] qla4xxx: Fix underrun/overrun conditions [SCSI] megaraid_mbox: use mutex instead of semaphore [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation. [SCSI] qla2xxx: update version to 8.02.00-k1. [SCSI] qla2xxx: add support for NPIV [SCSI] stex: use resid for xfer len information [SCSI] Add Brownie 1200U3P to blacklist [SCSI] scsi.c: convert to use the data buffer accessors ... commit d3502d7f25b22cfc9762bf1781faa9db1bb3be2e Merge: d2a9a8d... 0a9f2a4... Author: Linus Torvalds Date: Sun Jul 15 16:50:46 2007 -0700 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits) [TCP]: Verify the presence of RETRANS bit when leaving FRTO [IPV6]: Call inet6addr_chain notifiers on link down [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE [NET_SCHED]: act_api: qdisc internal reclassify support [NET_SCHED]: sch_dsmark: act_api support [NET_SCHED]: sch_atm: act_api support [NET_SCHED]: sch_atm: Lindent [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets [IPV4]: Cleanup call to __neigh_lookup() [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization [NETFILTER]: nf_conntrack: UDPLITE support [NETFILTER]: nf_conntrack: mark protocols __read_mostly [NETFILTER]: x_tables: add connlimit match [NETFILTER]: Lower *tables printk severity [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices. [AF_IUCV]: Add lock when updating accept_q ... commit d2a9a8ded48bec153f08ee87a40626c8d0737f79 Merge: 2d896c7... 0af8887... Author: Linus Torvalds Date: Sun Jul 15 16:44:53 2007 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix a race condition bug in umount which caused a segfault 9p: re-enable mount time debug option 9p: cache meta-data when cache=loose net/9p: set error to EREMOTEIO if trans->write returns zero net/9p: change net/9p module name to 9pnet 9p: Reorganization of 9p file system code commit 2d896c780db9cda5dc102bf7a0a2cd4394c1342e Merge: 2a9915c... 0f1145c... Author: Linus Torvalds Date: Sun Jul 15 16:43:43 2007 -0700 Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits) [XFS] Fix lockdep annotations for xfs_lock_inodes [LIB]: export radix_tree_preload() [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode [XFS] Compat ioctl handler for handle operations [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1. [XFS] Clean up function name handling in tracing code [XFS] Quota inode has no parent. [XFS] Concurrent Multi-File Data Streams [XFS] Use uninitialized_var macro to stop warning about rtx [XFS] XFS should not be looking at filp reference counts [XFS] Use is_power_of_2 instead of open coding checks [XFS] Reduce shouting by removing unnecessary macros from dir2 code. [XFS] Simplify XFS min/max macros. [XFS] Kill off xfs_count_bits [XFS] Cancel transactions on xfs_itruncate_start error. [XFS] Use do_div() on 64 bit types. [XFS] Fix remount,readonly path to flush everything correctly. [XFS] Cleanup inode extent size hint extraction [XFS] Prevent ENOSPC from aborting transactions that need to succeed [XFS] Prevent deadlock when flushing inodes on unmount ... commit 2a9915c8a2e532f6c9b435e5f90008d601a87b90 Author: Al Viro Date: Sun Jul 15 21:37:16 2007 +0100 make i2c-acorn tristate It depends on tristate I2C and it's trivial to make modular. The current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at all; alternatives are dependency on I2C=y and making I2C_ACORN itself a tristate. The latter is the right thing to do... Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Acked-by: Russell King commit ba5b55d0498bd56b9d60d85c5f654cd7b291e9c8 Author: Al Viro Date: Sun Jul 15 21:01:32 2007 +0100 icside: devm_iounmap() needs linux/io.h Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Acked-by: Russell King commit 05bd711ea2862cb1c754903326b7858bc700b2e9 Author: Al Viro Date: Sun Jul 15 21:01:22 2007 +0100 missing argument in bin_attribute ->read()/->write() Fallout from commit 91a6902958f052358899f58683d44e36228d85c2 ('sysfs: add parameter "struct bin_attribute *" ...') Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit ececfdee1cc287123149c801af201e41c7c3cc84 Author: Al Viro Date: Sun Jul 15 21:01:12 2007 +0100 fallout from constified seq_operations Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 8ca7ee6bcc542395cc68202d319a0404ad92b41d Author: Al Viro Date: Sun Jul 15 21:01:02 2007 +0100 fallout from Auke's pci ->revision patch Signed-off-by: Al Viro Acked-by: Jeff Garzik Signed-off-by: Linus Torvalds commit 2832e856fb7238dae0f385e309e66626d81ca0fa Author: Al Viro Date: Sun Jul 15 21:00:51 2007 +0100 ax88796: dev_dbg() wants device, not platform device Signed-off-by: Al Viro Acked-by: Jeff Garzik Signed-off-by: Linus Torvalds commit 22bb3e9e24e08a59efcb49943609ae88b6f628d0 Author: Al Viro Date: Sun Jul 15 21:00:41 2007 +0100 pass -msize-long to sparse on s390 s390 is the only 32bit with unsigned long for size_t (usual for those is unsigned int). Tell sparse... Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit b4a06918c2534fc9655424e75c879bf523aa5b06 Author: Al Viro Date: Sun Jul 15 21:00:31 2007 +0100 frv: missing __clear_user() Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit c248725b616179c43d944ee9f31d507fa483e6c8 Author: Al Viro Date: Sun Jul 15 21:00:21 2007 +0100 zd1211rw: too early inclusion of asm/unaligned.h Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 4381ca3c23b07ba5b567f72325003020ddca0341 Author: Al Viro Date: Sun Jul 15 21:00:11 2007 +0100 fix return type of skb_checksum_complete() It returns __sum16, not unsigned int Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 5f17c70fe6d0b8562bf396f7e5638f243ebe4da8 Author: Al Viro Date: Sun Jul 15 21:00:01 2007 +0100 PDA_POWER depends on having request_irq() ... so all proud owners of s390-based PDAs will have to live without that one Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 51ec138c6416e9ed2ba0eae3af5f0ea8a90ae44b Author: Al Viro Date: Sun Jul 15 20:59:51 2007 +0100 ieee1394: forgotten dereference... Going through the string and waiting for _pointer_ to become '\0' is not what the authors meant... Signed-off-by: Al Viro Acked-by: Ben Collins Signed-off-by: Linus Torvalds commit 0e81c666dbf95546b3d9ea6ff7d29ea19b988950 Author: Al Viro Date: Sun Jul 15 20:59:41 2007 +0100 the wrong variable checked after request_irq() Signed-off-by: Al Viro Acked-by: Benjamin Herrenschmidt Signed-off-by: Linus Torvalds commit 7c9e3c2e3b0437d10a09b77769baf325b94aa436 Author: Al Viro Date: Sun Jul 15 20:59:31 2007 +0100 wrong order of arguments of ->readdir() Shows how many people are testing coda - the bug had been there for 5 years and results of stepping on it are not subtle. Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 53b67950026ee642b43615f46df22ec3d36b4a53 Author: Al Viro Date: Sun Jul 15 20:59:22 2007 +0100 minimal fixes for drivers/usb/gadget/m66592-udc.c still looks racy (and definitely leaks) Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 0909fca51346d0ece688532c54d41ebc986aef7f Author: Andrew Morton Date: Sun Jul 15 22:37:03 2007 +0400 git-battery vs git-acpi drivers/w1/slaves/w1_ds2760.c:85: warning: initialization from incompatible pointer type The ACPI guys changed the bin_attr APIs (commit 91a6902958f052358899f58683d44e36228d85c2) Cc: Anton Vorontsov Cc: Len Brown Signed-off-by: Andrew Morton commit 7b3d54a8c30d2c524889a05d0c1334813d516b93 Author: Anton Vorontsov Date: Sun Jul 15 05:18:25 2007 +0400 Power supply class and drivers: remove non obligatory return statements Per Jeff Garzik request. Signed-off-by: Jeff Garzik Signed-off-by: Anton Vorontsov commit 5ebf6e6a96e41220edec23a90e4140985d1a5732 Author: Jeff Garzik Date: Sat Jul 14 19:12:04 2007 -0400 pda_power: clean up irq, timer Clean up pda_power interrupt handling: Prior to this patch, the driver would pass information it needed to the interrupt handler dev_id pointer, and then prompt forget it ever did so, recreating that same information after a couple passes through the timer-based state machine. This patch removes the redundant checks by passing the pda_power_supply[] pointer through the state machine. The current code passed 'irq' through the state machine, as an index to recreate the pointer, when we could more simply pass around the pointer itself. This patch makes it easier to remove the 'irq' argument in the future, in addition to cleaning up the driver today. Signed-off-by: Jeff Garzik commit 3be86148e7b394b5de2aeb720004f9788a66c300 Author: Anton Vorontsov Date: Sun Jul 15 04:43:36 2007 +0400 MAINTAINERS: Add maintainers for power supply subsystem and drivers Signed-off-by: Anton Vorontsov Acked-by: David Woodhouse commit 9413d7b8aa777dd1fc7db9563ce5e80d769fe7b5 Author: FUJITA Tomonori Date: Sat May 26 00:32:58 2007 +0900 [SCSI] ibmvscsi: convert to use the data buffer accessors - remove the unnecessary map_single path. - convert to use the new accessors for the sg lists and the parameters. Signed-off-by: FUJITA Tomonori Cc: Santiago Leon Signed-off-by: James Bottomley commit a862ea31655a382488315b701e0820a74faf120d Author: FUJITA Tomonori Date: Sat May 26 02:07:09 2007 +0900 [SCSI] dc395x: convert to use the data buffer accessors - remove the unnecessary map_single path. - convert to use the new accessors for the sg lists and the parameters. Jens Axboe did the for_each_sg cleanup. Signed-off-by: FUJITA Tomonori Cc: Jamie Lenehan Signed-off-by: James Bottomley commit 69eca4f52b044edd9890aaa25d74f52e5fcd170e Author: FUJITA Tomonori Date: Mon May 14 19:22:21 2007 +0900 [SCSI] ncr53c8xx: convert to use the data buffer accessors - remove the unnecessary map_single path. - convert to use the new accessors for the sg lists and the parameters. Jens Axboe did the for_each_sg cleanup. Signed-off-by: FUJITA Tomonori Cc: Matthew Wilcox Signed-off-by: James Bottomley commit 938febd62b860447247eb9b1c3b6bbc99d2c7f81 Author: FUJITA Tomonori Date: Sat May 26 02:31:17 2007 +0900 [SCSI] sym53c8xx: convert to use the data buffer accessors - remove the unnecessary map_single path. - convert to use the new accessors for the sg lists and the parameters. Jens Axboe did the for_each_sg cleanup. Signed-off-by: FUJITA Tomonori Cc: Matthew Wilcox Signed-off-by: James Bottomley commit cebadc5c97547aa8567ee2d628b187a34869b389 Author: Alan Cox Date: Mon Jul 9 12:00:10 2007 -0700 [SCSI] ppa: coding police and printk levels Add printk levels Clean up some oddities of formatting Fix goto labels Signed-off-by: Alan Cox Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 75a1099c0bfc5545088cdc9eae2fd98015656595 Author: Satyam Sharma Date: Mon Jul 9 12:00:07 2007 -0700 [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc drivers/scsi/aic7xxx_old.c:aic7xxx_slave_alloc() unnecessarily passes GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not atomic. Remove the pointless GFP_ATOMIC. Signed-off-by: Satyam Sharma Signed-off-by: Andrew Morton Cc: Doug Ledford Signed-off-by: James Bottomley commit a3f249a242e804b61a14518ec9d5ec8ee48720d7 Author: Satyam Sharma Date: Mon Jul 9 12:00:07 2007 -0700 [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c drivers/message/i2o/device.c:i2o_parm_field_get() unnecessarily passes GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not atomic. Remove the pointless GFP_ATOMIC. Signed-off-by: Satyam Sharma Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 3021c710cbf87c4ac528b737908d0c0669e8098e Author: Adrian Bunk Date: Mon Jul 9 12:00:10 2007 -0700 [SCSI] remove the dead CYBERSTORMIII_SCSI option Not converted to the 2.6 kconfig system and no code in the tree. Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 0a9f2a467d8dacaf7e97469dba99ed2d07287d80 Author: Ilpo Järvinen Date: Sun Jul 15 00:19:29 2007 -0700 [TCP]: Verify the presence of RETRANS bit when leaving FRTO For yet unknown reason, something cleared SACKED_RETRANS bit underneath FRTO. Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller commit 063ed369c97f8de4cce23bf93bebd7ffacb542ff Author: Vlad Yasevich Date: Sun Jul 15 00:16:35 2007 -0700 [IPV6]: Call inet6addr_chain notifiers on link down Currently if the link is brought down via ip link or ifconfig down, the inet6addr_chain notifiers are not called even though all the addresses are removed from the interface. This caused SCTP to add duplicate addresses to it's list. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller commit c3bc7cff8fddb6ff9715be8bfc3d911378c4d69d Author: Patrick McHardy Date: Sun Jul 15 00:03:05 2007 -0700 [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE, remove the old code. The config option will be kept around to select the equivalent NET_CLS_ACT options for a short time to allow easier upgrades. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 73ca4918fbb98311421259d82ef4ab44feeace43 Author: Patrick McHardy Date: Sun Jul 15 00:02:31 2007 -0700 [NET_SCHED]: act_api: qdisc internal reclassify support The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return it to the qdisc, which could handle it internally or ignore it. With NET_CLS_ACT however, tc_classify starts over at the first classifier and never returns it to the qdisc. This makes it impossible to support qdisc-internal reclassification, which in turn makes it impossible to remove the old NET_CLS_POLICE code without breaking compatibility since we have two qdiscs (CBQ and ATM) that support this. This patch adds a tc_classify_compat function that handles reclassification the old way and changes CBQ and ATM to use it. This again is of course not fully backwards compatible with the previous NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain compatibility *and* support qdisc internal reclassification with NET_CLS_ACT, but this seems like the better choice over keeping the two incompatible options around forever. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit f6853e2df3de82c1dac8f62ddcf3a8dfa302419e Author: Patrick McHardy Date: Sun Jul 15 00:02:10 2007 -0700 [NET_SCHED]: sch_dsmark: act_api support Handle act_api classification results. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 9210080445b0c51a73b488750a26eb17177d8684 Author: Patrick McHardy Date: Sun Jul 15 00:01:49 2007 -0700 [NET_SCHED]: sch_atm: act_api support Handle act_api classification results. The ATM scheduler behaves slightly different than other schedulers in that it only handles policer results for successful classifications, this behaviour is retained for the act_api case. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit b0188d4dbe5f4285372dd033acf7c92a97006629 Author: Patrick McHardy Date: Sun Jul 15 00:01:25 2007 -0700 [NET_SCHED]: sch_atm: Lindent Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit f13ec93fba60d339dc1663eb47b2fb801225d2d2 Author: Dmitry Butskoy Date: Sat Jul 14 23:53:08 2007 -0700 [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets From: Dmitry Butskoy Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747 Problem Description: It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp and raw sockets, both connected and unconnected. There is a little typo in net/ipv6/icmp.c code, which prevents such messages to be delivered to the errqueue of the correspond raw socket, when the socket is CONNECTED. The typo is due to swap of local/remote addresses. Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is looked up usual way, it is something like: sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif); where "daddr" is a destination address of the incoming packet (IOW our local address), "saddr" is a source address of the incoming packet (the remote end). But when the raw socket is looked up for some icmp error report, in net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed fragment of the "bad" packet, i.e. "daddr" is the original destination address of that packet, "saddr" is our local address. Hence, for icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr" ... Steps to reproduce: Create some raw socket, connect it to an address, and cause some error situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach. Set IPV6_RECVERR . Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN). You should receive "time exceeded" icmp message (because of "ttl=1"), but the socket do not receive it. If you do not connect your raw socket, you will receive MSG_ERRQUEUE successfully. (The reason is that for unconnected socket there are no actual checks for local/remote addresses). Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit d09f51b6997f3f443c5741bc696651e479576715 Merge: 1b1ac75... e559e91... Author: David S. Miller Date: Sat Jul 14 23:47:04 2007 -0700 Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6 Conflicts: crypto/Kconfig commit 1b1ac759d7c6bba6e5f4731ef6ea720b6636e27c Author: Jean Delvare Date: Sat Jul 14 20:51:44 2007 -0700 [IPV4]: Cleanup call to __neigh_lookup() Back in the times of Linux 2.2, negative values for the creat parameter of __neigh_lookup() had a particular meaning, but no longer, so we should pass 1 instead. Signed-off-by: Jean Delvare Signed-off-by: David S. Miller commit 0621ed2e4edbe2f6f83dafbf85eecefae7aaf2e8 Author: Patrick McHardy Date: Sat Jul 14 20:49:26 2007 -0700 [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization As noticed by Ranko Zivojnovic , calling qdisc_run from the timer handler can result in deadlock: > CPU#0 > > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! The entire optimization is a bit questionable IMO, it moves potentially large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ, which kind of defeats the separation of them. Signed-off-by: Patrick McHardy Acked-by: Ranko Zivojnovic Signed-off-by: David S. Miller commit 59eecdfb166f6846ae356ddc744abed5820ad965 Author: Patrick McHardy Date: Sat Jul 14 20:48:44 2007 -0700 [NETFILTER]: nf_conntrack: UDPLITE support Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 61075af51f252913401c41fbe94075b46c94e9f1 Author: Patrick McHardy Date: Sat Jul 14 20:48:19 2007 -0700 [NETFILTER]: nf_conntrack: mark protocols __read_mostly Also remove two unnecessary EXPORT_SYMBOLs and move the nf_conntrack_l3proto_ipv4 declaration to the correct file. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 370786f9cfd430cb424f00ce4110e75bb1b95a19 Author: Jan Engelhardt Date: Sat Jul 14 20:47:26 2007 -0700 [NETFILTER]: x_tables: add connlimit match ipt_connlimit has been sitting in POM-NG for a long time. Here is a new shiny xt_connlimit with: * xtables'ified * will request the layer3 module (previously it hotdropped every packet when it was not loaded) * fixed: there was a deadlock in case of an OOM condition * support for any layer4 protocol (e.g. UDP/SCTP) * using jhash, as suggested by Eric Dumazet * ipv6 support Signed-off-by: Jan Engelhardt Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit a887c1c148ffb3eb1c193e9869ca5297c6e22078 Author: Patrick McHardy Date: Sat Jul 14 20:46:15 2007 -0700 [NETFILTER]: Lower *tables printk severity Lower ip6tables, arptables and ebtables printk severity similar to Dan Aloni's patch for iptables. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 130e7a83d7ec8c5c673225e0fa8ea37b1ed507a5 Author: Yasuyuki Kozakai Date: Sat Jul 14 20:45:41 2007 -0700 [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error The conntrack assigned to locally generated ICMP error is usually the one assigned to the original packet which has caused the error. But if the original packet is handled as invalid by nf_conntrack, no conntrack is assigned to the original packet. Then nf_ct_attach() cannot assign any conntrack to the ICMP error packet. In that case the current nf_conntrack_icmp assigns appropriate conntrack to it. But the current code mistakes the direction of the packet. As a result, NAT code mistakes the address to be mangled. To fix the bug, this changes nf_conntrack_icmp not to assign conntrack to such ICMP error. Actually no address is necessary to be mangled in this case. Spotted by Jordan Russell. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit e2a3123fbe58da9fd3f35cd242087896ace6049f Author: Yasuyuki Kozakai Date: Sat Jul 14 20:45:14 2007 -0700 [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it nf_ct_get_tuple() requires the offset to transport header and that bothers callers such as icmp[v6] l4proto modules. This introduces new function to simplify them. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit ffc30690480bdd337e4914302b926d24870b56b2 Author: Yasuyuki Kozakai Date: Sat Jul 14 20:44:50 2007 -0700 [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple. But they have to find the offset to transport protocol header before that. Their processings are almost same as prepare() of l3proto modules. This makes prepare() more generic to simplify icmp[v6] l4proto module later. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit d87d8469e2dd19a3a134b99f78288d41854c614b Author: Yasuyuki Kozakai Date: Sat Jul 14 20:44:23 2007 -0700 [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 6460d948f3ebf7d5040328a60a0ab7221f69945b Author: Michael Chan Date: Sat Jul 14 19:07:52 2007 -0700 [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices. Add ethtool utility function to set or clear IPV6_CSUM feature flag. Modify tg3.c and bnx2.c to use this function when doing ethtool -K to change tx checksum. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit febca281f677a775c61cd0572c2f35e4ead9e7d5 Author: Ursula Braun Date: Sat Jul 14 19:04:25 2007 -0700 [AF_IUCV]: Add lock when updating accept_q The accept_queue of an af_iucv socket will be corrupted, if adding and deleting of entries in this queue occurs at the same time (connect request from one client, while accept call is processed for another client). Solution: add locking when updating accept_q Signed-off-by: Ursula Braun Acked-by: Frank Pavlic Signed-off-by: David S. Miller commit 13fdc9a74df0fec70f421c6891e184ed8c3b9088 Author: Ursula Braun Date: Sat Jul 14 19:03:41 2007 -0700 [AF_IUCV]: Avoid deadlock between iucv_path_connect and tasklet. An iucv deadlock may occur, where one CPU is spinning on the iucv_table_lock for iucv_tasklet_fn(), while another CPU is holding the iucv_table_lock for an iucv_path_connect() and is waiting for the first CPU in an smp_call_function. Solution: replace spin_lock in iucv_tasklet_fn by spin_trylock and reschedule tasklet in case of non-granted lock. Signed-off-by: Ursula Braun Acked-by: Frank Pavlic Signed-off-by: David S. Miller commit da7de31cc50796a53593785d4508b7b7ffa9a9b2 Author: Jennifer Hunt Date: Sat Jul 14 19:03:00 2007 -0700 [AF_IUCV]: Improve description of IUCV and AFIUCV configuration options. Signed-off-by: Jennifer Hunt Signed-off-by: Ursula Braun >braunu@de.ibm.com> Acked-by: Frank Pavlic Signed-off-by: David S. Miller commit acd159b6b5828175be6b9ccccd9b054239ec63e9 Author: Adrian Bunk Date: Sat Jul 14 19:00:59 2007 -0700 [INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static This patch makes the needlessly global __inet_twsk_kill() static. Signed-off-by: Adrian Bunk Signed-off-by: David S. Miller commit cf3842ec5015c862f4869e3641a8549393bb958e Merge: b3b0b68... 63fc33c... Author: David S. Miller Date: Sat Jul 14 18:58:49 2007 -0700 Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 commit b3b0b681b12478a7afa7d1f3d58be96830e16c7d Author: Stephen Hemminger Date: Sat Jul 14 18:57:19 2007 -0700 [TCP]: tcp probe add back ssthresh field Sangtae noticed the ssthresh got missed. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit a7ecfc866578e665e20004a2f5fff5b73e8be3bc Author: Patrick McHardy Date: Sat Jul 14 18:56:30 2007 -0700 [VLAN]: Fix memset length Fix sizeof(ETH_ALEN) Introduced by my rtnl_link patches. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit b863ceb7ddcea8c55fcf1d7b2ac591d50aa7ed53 Author: Patrick McHardy Date: Sat Jul 14 18:55:06 2007 -0700 [NET]: Add macvlan driver Add macvlan driver, which allows to create virtual ethernet devices based on MAC address. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 56addd6eeeb4e11f5a0af7093ca078e0f29140e0 Author: Patrick McHardy Date: Sat Jul 14 18:53:28 2007 -0700 [VLAN]: Use multicast list synchronization helpers Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 6c78dcbd47a68a7d25d2bee7a6c74b9136cb5fde Author: Patrick McHardy Date: Sat Jul 14 18:52:56 2007 -0700 [VLAN]: Fix promiscous/allmulti synchronization races The set_multicast_list function may be called without holding the rtnl mutex, resulting in races when changing the underlying device's promiscous and allmulti state. Use the change_rx_mode hook, which is always invoked under the rtnl. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 Author: Patrick McHardy Date: Sat Jul 14 18:52:02 2007 -0700 [NET]: dev_mcast: add multicast list synchronization helpers The method drivers currently use to synchronize multicast lists is not very pretty: - walk the multicast list - search each entry on a copy of the previous list - if new add to lower device - walk the copy of the previous list - search each entry on the current list - if removed delete from lower device - copy entire list This patch adds a new field to struct dev_addr_list to store the synchronization state and adds two helper functions for synchronization and cleanup. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 24023451c8df726692e2f52288a20870d13b501f Author: Patrick McHardy Date: Sat Jul 14 18:51:31 2007 -0700 [NET]: Add net_device change_rx_mode callback Currently the set_multicast_list (and set_rx_mode) callbacks are responsible for configuring the device according to the IFF_PROMISC, IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in case of set_rx_mode). These callbacks can be invoked from BH context without the rtnl_mutex by dev_mc_add/dev_mc_delete, which makes reading the device flags and promiscous/allmulti count racy. For real hardware drivers that just commit all changes to the hardware this is not a real problem since the stack guarantees to call them for every change, so at least the final call will not race and commit the correct configuration to the hardware. For software devices that want to synchronize promiscous and multicast state to an underlying device however this can cause corruption of the underlying device's flags or promisc/allmulti counts. When the software device is concurrently put in promiscous or allmulti mode while set_multicast_list is invoked from bottem half context, the device might synchronize the change to the underlying device without holding the rtnl_mutex, which races with concurrent changes to the underlying device. Add a dev->change_rx_flags hook that is invoked when any of the flags that affect rx filtering change (under the rtnl_mutex), which allows drivers to perform synchronization immediately and only synchronize the address lists in set_multicast_list/set_rx_mode. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit e6c9116d1dc984cb7ecf1b0fe26ca4a8ab36bb57 Author: Ingo Molnar Date: Sat Jul 14 18:50:15 2007 -0700 [RFKILL]: fix net/rfkill/rfkill-input.c bug on 64-bit systems Subject: [patch] net/input: fix net/rfkill/rfkill-input.c bug on 64-bit systems this recent commit: commit cf4328cd949c2086091c62c5685f1580fe9b55e4 Author: Ivo van Doorn Date: Mon May 7 00:34:20 2007 -0700 [NET]: rfkill: add support for input key to control wireless radio added this 64-bit bug: .... unsigned int flags; spin_lock_irqsave(&task->lock, flags); .... irq 'flags' must be unsigned long, not unsigned int. The -rt tree has strict checks about this on 64-bit so this triggered a build failure. Signed-off-by: Ingo Molnar Signed-off-by: David S. Miller commit 7689e82efdb636e8a076a1293b977bce313110c5 Author: Cornelia Huck Date: Mon Jul 9 11:59:59 2007 -0700 [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA With dma-mapping-prevent-dma-dependent-code-from-linking-on.patch scsi fails to build on !HAS_DMA architectures: drivers/built-in.o(.text+0x20af6): In function `scsi_dma_map': : undefined reference to `dma_map_sg' drivers/built-in.o(.text+0x20b5c): In function `scsi_dma_unmap': : undefined reference to `dma_unmap_sg' I split those functions out into a new file. Builds on s390 and i386. Move scsi_dma_{map,unmap} into scsi_lib_dma.c which is only build if HAS_DMA is set. Signed-off-by: Cornelia Huck Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: James Bottomley Cc: Jeff Garzik Cc: Christoph Hellwig Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 6d877688ef411313c94aa3c83c7473fbec6db32c Author: Matthew Wilcox Date: Wed Jul 11 12:54:55 2007 -0600 [SCSI] Clean up scsi_add_lun a bit This patch tidies up scsi_add_lun a bit. I rewrote the kerneldoc to match the actual parameters, moved the check for RBC and MMC REPORT_LUN devices away from the switch(), changed the setup of sdev->type to account for BLIST_ISROM, moved the check for BLIST_NO_ULD_ATTACH further down in the function, removed a bogus comment and fixed some whitespace issues. Signed-off-by: Matthew Wilcox Signed-off-by: James Bottomley commit 0cba35e42ce58a5b20319f9f57f9aa4ce37daf76 Author: Thomas Bogendoerfer Date: Wed Jul 11 19:08:31 2007 +0200 [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs remove printk, which triggers because of low scsi clock on SNI RMs Signed-off-by: Thomas Bogendoerfer Signed-off-by: James Bottomley commit 2da8658910580787ced4abc668adfb0f09710efb Author: Thomas Bogendoerfer Date: Wed Jul 11 19:09:36 2007 +0200 [SCSI] sni_53c710: Cleanup - base address is now a physical address; no need to convert it - remove not needed error printk in module init function Signed-off-by: Thomas Bogendoerfer Signed-off-by: James Bottomley commit 6ea7e33ee1b74de9b60327fec1a0cd39afac3983 Author: David C Somayajulu Date: Mon Jul 9 12:44:36 2007 -0700 [SCSI] qla4xxx: Fix underrun/overrun conditions On Wed, 2007-06-06 at 11:55 -0700, David C Somayajulu wrote: This patch fixes the code handling underrun and overrun conditions. Also fixed coding style as per Mike Christie's advice. Signed-off-by: David Somayajulu Signed-off-by: James Bottomley commit 0c2cc4337968f7aab91a91b8d5889982e3a3bd0d Author: Matthias Kaehlcke Date: Mon Jul 9 12:00:11 2007 -0700 [SCSI] megaraid_mbox: use mutex instead of semaphore The Megaraid Mailbox driver uses a semaphore as mutex. Use the mutex API instead of the (binary) semaphore. Signed-off-by: Matthias Kaehlcke Acked-by: "Patro, Sumant" Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 5fa0f5e47a87ceb8a3269b28fa14764b37364f63 Author: Salyzyn, Mark Date: Mon Jul 9 09:57:11 2007 -0400 [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation. Adding Adaptec 51245 (16 port), 51645 (20 port) and 52445 (28 port) Universal Serial RAID controllers to the aacraid documentation. Signed-off-by: Mark Salyzyn Signed-off-by: James Bottomley commit cad7d7858bde1c1a6c1132ec8cad2b3abfacbd0d Author: Seokmann Ju Date: Thu Jul 5 13:17:01 2007 -0700 [SCSI] qla2xxx: update version to 8.02.00-k1. Following patch bump up the driver version reflecting NPIV addition to the qla2xxx. - version changed from 8.01.07-k7 to 8.02.00-k1. Signed-off-by: Seokmann Ju Signed-off-by: James Bottomley commit 2c3dfe3f6ad8daff5acdb01713e4f2b116e78136 Author: Seokmann Ju Date: Thu Jul 5 13:16:51 2007 -0700 [SCSI] qla2xxx: add support for NPIV Following patch adds support for NPIV (N-Port ID Virtualization) to the qla2xxx. - supported within switched-fabric topologies only. - supports up to 63 virtual ports on each physical port. Signed-off-by: Seokmann Ju Signed-off-by: James Bottomley commit 968a5763fb7247feb0e69573a2975a7a0c094267 Author: Ed Lin Date: Thu Jul 5 12:09:06 2007 -0700 [SCSI] stex: use resid for xfer len information The original implementation in stex_ys_commands() is inappropriate. For xfer len information, we should use resid instead. Signed-off-by: Ed Lin Signed-off-by: James Bottomley commit 80dc3e062a8f82acd5852df15f6b4bc3359de871 Author: Matthew Wilcox Date: Thu Jul 5 08:57:50 2007 -0600 [SCSI] Add Brownie 1200U3P to blacklist The Brownie 1200U3P has the same problem with REPORT LUNS as the 1600U3P. Add it to the blacklist. Signed-off-by: Matthew Wilcox Signed-off-by: James Bottomley commit a73e45b3da2be548c8f45cf887e4171a8f39a3c3 Author: Boaz Harrosh Date: Wed Jul 4 21:26:01 2007 +0300 [SCSI] scsi.c: convert to use the data buffer accessors - a couple of prints, they can use the accessors Signed-off-by: Boaz Harrosh Signed-off-by: James Bottomley commit 0ab179bcf31fd54c7b34b4191ea8591267641e92 Author: Boaz Harrosh Date: Wed Jul 4 21:18:55 2007 +0300 [SCSI] tmscsim: Further clean-up of the driver - The saved sg_count was a leftover from the time the driver was doing dma mapping by himself. But now that scsi-ml is called for the mapping it is not the drivers responsibility. Signed-off-by: Boaz Harrosh Acked-by: G. Liakhovetski Signed-off-by: James Bottomley commit cde760856ce3a88bcceb02f208bcd259c2a71c4c Author: Geert Uytterhoeven Date: Thu Jun 28 13:53:02 2007 +0200 [SCSI] CONFIG_SCSI_FD_8xx no longer exists CONFIG_SCSI_FD_8xx no longer exists. Apparently it was renamed to CONFIG_SCSI_SEAGATE, but the Makefile was not correctly updated. Signed-off-by: Geert Uytterhoeven Signed-off-by: James Bottomley commit da3962fe63eae4f490356cb54e4700eac752541b Author: Akinobu Mita Date: Tue Jun 26 00:39:33 2007 +0900 [SCSI] sr: fix error handling in module_init Sweep registered blkdev when scsi_register_driver has failed. Cc: Jens Axboe Signed-off-by: Akinobu Mita Signed-off-by: James Bottomley commit a57850379e389829a2fc569733b41da3d52bf366 Author: James Bottomley Date: Sat Jul 14 18:47:04 2007 -0500 [SCSI] lpfc: Fix NPIV compile problem drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_create_port': drivers/scsi/lpfc/lpfc_init.c:1573: error: 'struct kobject' has no member named 'dentry' Just remove the if check on this ... lpfc shouldn't be poking around in kobject structures. drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_pci_probe_one': drivers/scsi/lpfc/lpfc_init.c:1723: warning: unused variable 'retval' And remove the unused variable. Cc: James Smart Signed-off-by: James Bottomley commit c59fd9ebc46da8d48b76955d4d48e3597f8c8726 Author: FUJITA Tomonori Date: Wed Jul 4 06:03:11 2007 -0700 [SCSI] lpfc: fix NPIV mapping problems This patch uses dma_map_sg with phba->pcidev->dev instead of scsi_dma_map. scsi_dma_map doesn't work for NPIV since fc_vport->dev isn't fully initialized. check_addr() in arch/x86_64/kernel/pci-nommu.c leads to the crash since dev->dma_mask is NULL. For more details: http://marc.info/?l=linux-scsi&m=118312448030633&w=2 Signed-off-by: FUJITA Tomonori Acked-by: James Smart Signed-off-by: James Bottomley commit d4bd4cd0630060a64681590b9405b87e43c11f14 Author: Boaz Harrosh Date: Tue Jul 3 18:12:35 2007 +0300 [SCSI] lpfc: add missed data buffer accessor This is an addendum to: commit a0b4f78f9a4c869e9b29f254054ad7441cb40bbf Author: FUJITA Tomonori [SCSI] lpfc: convert to use the data buffer accessors One place was missed in the merge Signed-off-by: Boaz Harrosh Acked-by: James Smart Signed-off-by: James Bottomley commit d0f656cad313bb04a151273bb57e108b2cc9876f Author: Priyanka Gupta Date: Tue Jun 19 14:02:10 2007 -0700 [SCSI] Remove unused method scsi_device_cancel Removes an obsolete method scsi_device_cancel which isn't being used anywhere in the kernel. Signed-off-by: Priyanka Gupta Acked-by: Grant Grundler Signed-off-by: James Bottomley commit 0af8887ebf4556a76680a61b0bb156d934702c63 Author: Eric Van Hensbergen Date: Fri Jul 13 16:47:58 2007 -0500 9p: fix a race condition bug in umount which caused a segfault umounting partitions after heavy activity would sometimes trigger a segmentation violation. This fix appears to remove that problem. Fix originally provided by Latchesar Ionkov. Signed-off-by: Eric Van Hensbergen commit 9e2f6688c0b52882496aff576b009bc1f7eea0b8 Author: Eric Van Hensbergen Date: Fri Jul 13 13:05:21 2007 -0500 9p: re-enable mount time debug option During reorganization, the mount time debug option was removed in favor of module-load-time parameters. However, the mount time option is still a useful for feature during debug and for user-fault isolation when the module is compiled into the kernel. Signed-off-by: Eric Van Hensbergen commit 9523a841b109765f8779236d28be6458ee3a6824 Author: Eric Van Hensbergen Date: Fri Jul 13 13:01:27 2007 -0500 9p: cache meta-data when cache=loose This patch expands the impact of the loose cache mode to allow for cached metadata increasing the performance of directory listings and other metadata read operations. Signed-off-by: Eric Van Hensbergen commit 1d6b5602381524c339af2c2fdfe42ad0a01464a4 Author: Latchesar Ionkov Date: Wed Jul 11 15:14:46 2007 -0600 net/9p: set error to EREMOTEIO if trans->write returns zero If trans->write returns 0, p9_write_work goes through the error path, but sets the error code to zero. This patch sets the error code to EREMOTEIO if trans->write returns zero value. Signed-off-by: Latchesar Ionkov commit e46662be7fddde3464bf208317542c2f8df13d0b Author: Latchesar Ionkov Date: Wed Jul 11 15:13:54 2007 -0600 net/9p: change net/9p module name to 9pnet Change module name of net/9p module from 9p.ko to 9pnet.ko. fs/9p module already uses 9p.ko name. Signed-off-by: Latchesar Ionkov commit bd238fb431f31989898423c8b6496bc8c4204a86 Author: Latchesar Ionkov Date: Tue Jul 10 17:57:28 2007 -0500 9p: Reorganization of 9p file system code This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p. It moves the transport, packet marshalling and connection layers to net/9p leaving only the VFS related files in fs/9p. This work is being done in preparation for in-kernel 9p servers as well as alternate 9p clients (other than VFS). Signed-off-by: Latchesar Ionkov Signed-off-by: Eric Van Hensbergen commit 0f1145cc18e970ebe37da114fc34c297f135e062 Author: David Chinner Date: Fri Jun 29 17:26:09 2007 +1000 [XFS] Fix lockdep annotations for xfs_lock_inodes SGI-PV: 967035 SGI-Modid: xfs-linux-melb:xfs-kern:29026a Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit d7f0923d83dcabfc063257a281529cdbcd5eedb5 Author: David Chinner Date: Sat Jul 14 16:05:04 2007 +1000 [LIB]: export radix_tree_preload() XFS filestreams functionality uses radix trees and the preload functions. XFS can be built as a module and hence we need radix_tree_preload() exported. radix_tree_preload_end() is a static inline, so it doesn't need exporting. Signed-Off-By: Dave Chinner Signed-Off-By: Tim Shimmin commit faa63e9584df41020440756b8b90b7b63f95e4f6 Author: Michal Marek Date: Wed Jul 11 11:10:19 2007 +1000 [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode * 32bit struct xfs_fsop_bulkreq has different size and layout of members, no matter the alignment. Move the code out of the #else branch (why was it there in the first place?). Define _32 variants of the ioctl constants. * 32bit struct xfs_bstat is different because of time_t and on i386 because of different padding. Make xfs_bulkstat_one() accept a custom "output formatter" in the private_data argument which takes care of the xfs_bulkstat_one_compat() that takes care of the different layout in the compat case. * i386 struct xfs_inogrp has different padding. Add a similar "output formatter" mecanism to xfs_inumbers(). SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29102a Signed-off-by: Michal Marek Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 1fa503df66f7bffc0ff62662626897eec79446c2 Author: Michal Marek Date: Wed Jul 11 11:10:09 2007 +1000 [XFS] Compat ioctl handler for handle operations 32bit struct xfs_fsop_handlereq has different size and offsets (due to pointers). TODO: case XFS_IOC_{FSSETDM,ATTRLIST,ATTRMULTI}_BY_HANDLE still not handled. SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29101a Signed-off-by: Michal Marek Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 547e00c3c681265b1fe5e34c7643f3ddac748ba0 Author: Michal Marek Date: Wed Jul 11 11:09:57 2007 +1000 [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1. i386 struct xfs_fsop_geom_v1 has no padding after the last member, so the size is different. SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29100a Signed-off-by: Michal Marek Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 3a59c94c4b48878c6af047cdfc8c137d0fa7a0f0 Author: Eric Sandeen Date: Wed Jul 11 11:09:47 2007 +1000 [XFS] Clean up function name handling in tracing code Remove the hardcoded "fnames" for tracing, and just embed them in tracing macros via __FUNCTION__. Kills a lot of #ifdefs too. SGI-PV: 967353 SGI-Modid: xfs-linux-melb:xfs-kern:29099a Signed-off-by: Eric Sandeen Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit b11f94d537e6b69f13770143fd7ded3d09fdbaab Author: David Chinner Date: Wed Jul 11 11:09:33 2007 +1000 [XFS] Quota inode has no parent. Avoid using a special "zero inode" as the parent of the quota inode as this can confuse the filestreams code into thinking the quota inode has a parent. We do not want the quota inode to follow filestreams allocation rules, so pass a NULL as the parent inode and detect this condition when doing stream associations. SGI-PV: 964469 SGI-Modid: xfs-linux-melb:xfs-kern:29098a Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 2a82b8be8a8dacb48cb7371449a7a9daa558b4a8 Author: David Chinner Date: Wed Jul 11 11:09:12 2007 +1000 [XFS] Concurrent Multi-File Data Streams In media spaces, video is often stored in a frame-per-file format. When dealing with uncompressed realtime HD video streams in this format, it is crucial that files do not get fragmented and that multiple files a placed contiguously on disk. When multiple streams are being ingested and played out at the same time, it is critical that the filesystem does not cross the streams and interleave them together as this creates seek and readahead cache miss latency and prevents both ingest and playout from meeting frame rate targets. This patch set creates a "stream of files" concept into the allocator to place all the data from a single stream contiguously on disk so that RAID array readahead can be used effectively. Each additional stream gets placed in different allocation groups within the filesystem, thereby ensuring that we don't cross any streams. When an AG fills up, we select a new AG for the stream that is not in use. The core of the functionality is the stream tracking - each inode that we create in a directory needs to be associated with the directories' stream. Hence every time we create a file, we look up the directories' stream object and associate the new file with that object. Once we have a stream object for a file, we use the AG that the stream object point to for allocations. If we can't allocate in that AG (e.g. it is full) we move the entire stream to another AG. Other inodes in the same stream are moved to the new AG on their next allocation (i.e. lazy update). Stream objects are kept in a cache and hold a reference on the inode. Hence the inode cannot be reclaimed while there is an outstanding stream reference. This means that on unlink we need to remove the stream association and we also need to flush all the associations on certain events that want to reclaim all unreferenced inodes (e.g. filesystem freeze). SGI-PV: 964469 SGI-Modid: xfs-linux-melb:xfs-kern:29096a Signed-off-by: David Chinner Signed-off-by: Barry Naujok Signed-off-by: Donald Douwsma Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin Signed-off-by: Vlad Apostolov commit 0892ccd6fe13e08ad9e57007afbb78fe02d66005 Author: Andrew Morton Date: Thu Jun 28 16:46:56 2007 +1000 [XFS] Use uninitialized_var macro to stop warning about rtx Appease gcc in regards to "warning: 'rtx' is used uninitialized in this function". SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:29007a Signed-off-by: Andrew Morton Signed-off-by: Tim Shimmin commit fbf3ce8d8ec508f6bd99b36de034d2ae3e1ae7ac Author: Christoph Hellwig Date: Thu Jun 28 16:46:47 2007 +1000 [XFS] XFS should not be looking at filp reference counts A check for file_count is always a bad idea. Linux has the ->release method to deal with cleanups on last close and ->flush is only for the very rare case where we want to perform an operation on every drop of a reference to a file struct. This patch gets rid of vop_close and surrounding code in favour of simply doing the page flushing from ->release. SGI-PV: 966562 SGI-Modid: xfs-linux-melb:xfs-kern:28952a Signed-off-by: Christoph Hellwig Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 16a087d8e1af9b974125870dceb9e4a35249ad1d Author: Vignesh Babu Date: Thu Jun 28 16:46:37 2007 +1000 [XFS] Use is_power_of_2 instead of open coding checks SGI-PV: 966576 SGI-Modid: xfs-linux-melb:xfs-kern:28950a Signed-off-by: Vignesh Babu Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit bbaaf53808c778bda24f8245a440c5ceacc1a37d Author: Christoph Hellwig Date: Thu Jun 28 16:43:50 2007 +1000 [XFS] Reduce shouting by removing unnecessary macros from dir2 code. SGI-PV: 966505 SGI-Modid: xfs-linux-melb:xfs-kern:28947a Signed-off-by: Christoph Hellwig Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 54aa8e26e90da882b145fcd33ed752431d6b318b Author: David Chinner Date: Thu Jun 28 16:43:39 2007 +1000 [XFS] Simplify XFS min/max macros. SGI-PV: 964547 SGI-Modid: xfs-linux-melb:xfs-kern:28945a Signed-off-by: David Chinner Signed-off-by: Nathan Scott Signed-off-by: Tim Shimmin commit 24ad33ff714bd117cab30e71e2ad41e4e1185108 Author: Eric Sandeen Date: Thu Jun 28 16:43:30 2007 +1000 [XFS] Kill off xfs_count_bits xfs_count_bits is only called once, and is then compared to 0. IOW, what it really wants to know is, is the bitmap empty. This can be done more simply, certainly. SGI-PV: 966503 SGI-Modid: xfs-linux-melb:xfs-kern:28944a Signed-off-by: Eric Sandeen Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 87ae3c2411cfd280e8289e232b718fae9f63950b Author: Jesper Juhl Date: Thu Jun 28 16:43:14 2007 +1000 [XFS] Cancel transactions on xfs_itruncate_start error. SGI-PV: 966502 SGI-Modid: xfs-linux-melb:xfs-kern:28943a Signed-off-by: Jesper Juhl Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 39726be2a2e6e61f352852da2c3a807773e33346 Author: Christoph Hellwig Date: Mon Jun 18 17:57:45 2007 +1000 [XFS] Use do_div() on 64 bit types. SGI-PV: 966145 SGI-Modid: xfs-linux-melb:xfs-kern:28889a Signed-off-by: Christoph Hellwig Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 516b2e7c2661615ba5d5ad9fb584f068363502d3 Author: David Chinner Date: Mon Jun 18 16:50:48 2007 +1000 [XFS] Fix remount,readonly path to flush everything correctly. The remount readonly path can fail to writeback properly because we still have active transactions after calling xfs_quiesce_fs(). Further investigation shows that this path is broken in the same ways that the xfs freeze path was broken so fix it the same way. SGI-PV: 964464 SGI-Modid: xfs-linux-melb:xfs-kern:28869a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 957d0ebed04239b734552c7da3fae9094b6f090c Author: David Chinner Date: Mon Jun 18 16:50:37 2007 +1000 [XFS] Cleanup inode extent size hint extraction SGI-PV: 966004 SGI-Modid: xfs-linux-melb:xfs-kern:28866a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 Author: David Chinner Date: Mon Jun 18 16:50:27 2007 +1000 [XFS] Prevent ENOSPC from aborting transactions that need to succeed During delayed allocation extent conversion or unwritten extent conversion, we need to reserve some blocks for transactions reservations. We need to reserve these blocks in case a btree split occurs and we need to allocate some blocks. Unfortunately, we've only ever reserved the number of data blocks we are allocating, so in both the unwritten and delalloc case we can get ENOSPC to the transaction reservation. This is bad because in both cases we cannot report the failure to the writing application. The fix is two-fold: 1 - leverage the reserved block infrastructure XFS already has to reserve a small pool of blocks by default to allow specially marked transactions to dip into when we are at ENOSPC. Default setting is min(5%, 1024 blocks). 2 - convert critical transaction reservations to be allowed to dip into this pool. Spots changed are delalloc conversion, unwritten extent conversion and growing a filesystem at ENOSPC. This also allows growing the filesytsem to succeed at ENOSPC. SGI-PV: 964468 SGI-Modid: xfs-linux-melb:xfs-kern:28865a Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 641c56fbfeae85d5ec87fee90a752f7b7224f236 Author: David Chinner Date: Mon Jun 18 16:50:17 2007 +1000 [XFS] Prevent deadlock when flushing inodes on unmount When we are unmounting the filesystem, we flush all the inodes to disk. Unfortunately, if we have an inode cluster that has just been freed and marked stale sitting in an incore log buffer (i.e. hasn't been flushed to disk), it will be holding all the flush locks on the inodes in that cluster. xfs_iflush_all() which is called during unmount walks all the inodes trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each inode. If the inode is dirty, if grabs the flush lock and flushes it. Unfortunately, find dirty inodes that already have their flush lock held and so we sleep. At this point in the unmount process, we are running single-threaded. There is nothing more that can push on the log to force the transaction holding the inode flush locks to disk and hence we deadlock. The fix is to issue a log force before flushing the inodes on unmount so that all the flush locks will be released before we start flushing the inodes. SGI-PV: 964538 SGI-Modid: xfs-linux-melb:xfs-kern:28862a Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 0164af51cedf46e1d58fd53854373f544150c597 Author: Tim Shimmin Date: Mon Jun 18 16:50:08 2007 +1000 [XFS] Log the agf_length change in xfs_growfs_data_private(). SGI-PV: 963528 SGI-Modid: xfs-linux-melb:xfs-kern:28856a Signed-off-by: Tim Shimmin Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig commit effd120edb7609069cca9f3d1cb4bfae464b2f85 Author: David Chinner Date: Mon Jun 18 16:49:58 2007 +1000 [XFS] Map unwritten extents correctly for I/o completion processing If we have multiple unwritten extents within a single page, we fail to tell the I/o completion construction handlers we need a new handle for the second and subsequent blocks in the page. While we still issue the I/O correctly, we do not have the correct ranges recorded in the ioend structures and hence when we go to convert the unwritten extents we screw it up. Make sure we start a new ioend every time the mapping changes so that we convert the correct ranges on I/O completion. SGI-PV: 964647 SGI-Modid: xfs-linux-melb:xfs-kern:28797a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 45c34141126a89da07197d5b89c04c6847f1171a Author: David Chinner Date: Mon Jun 18 16:49:44 2007 +1000 [XFS] Apply transaction delta counts atomically to incore counters With the per-cpu superblock counters, batch updates are no longer atomic across the entire batch of changes. This is not an issue if each individual change in the batch is applied atomically. Unfortunately, free block count changes are not applied atomically, and they are applied in a manner guaranteed to cause problems. Essentially, the free block count reservation that the transaction took initially is returned to the in core counters before a second delta takes away what is used. because these two operations are not atomic, we can race with another thread that can use the returned transaction reservation before the transaction takes the space away again and we can then get ENOSPC being reported in a spot where we don't have an ENOSPC condition, nor should we ever see one there. Fix it up by rolling the two deltas into the one so it can be applied safely (i.e. atomically) to the incore counters. SGI-PV: 964465 SGI-Modid: xfs-linux-melb:xfs-kern:28796a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit b2826136a1fc3ea451bcbb73a75ca50b3231aa8f Author: David Chinner Date: Tue Jun 5 16:24:44 2007 +1000 [XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize(). SGI-PV: 965636 SGI-Modid: xfs-linux-melb:xfs-kern:28777a Signed-off-by: David Chinner Signed-off-by: Olaf Weber Signed-off-by: Tim Shimmin commit e927af90aaa7d75543edbbd9c2810e6963d0443f Author: David Chinner Date: Tue Jun 5 16:24:36 2007 +1000 [XFS] Block on unwritten extent conversion during synchronous direct I/O. Currently we do not wait on extent conversion to occur, and hence we can return to userspace from a synchronous direct I/O write without having completed all the actions in the write. Hence a read after the write may see zeroes (unwritten extent) rather than the data that was written. Block the I/O completion by triggering a synchronous workqueue flush to ensure that the conversion has occurred before we return to userspace. SGI-PV: 964092 SGI-Modid: xfs-linux-melb:xfs-kern:28775a Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit f4a9f28a909debe97cd3f6ca30e82e5811125bff Author: David Chinner Date: Tue Jun 5 16:24:27 2007 +1000 [XFS] Flush the block device before closing it on unmount. SGI-PV: 965630 SGI-Modid: xfs-linux-melb:xfs-kern:28774a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 4e5ae8386b55677bde05bbd38b8fc82c67ad4564 Author: David Chinner Date: Tue Jun 5 16:24:15 2007 +1000 [XFS] xfs_bmapi fails to update the previous extent pointer When processing multiple extent maps, xfs_bmapi needs to keep track of the extent behind the one it is currently working on to be able to trim extent ranges correctly. Failing to update the previous pointer can result in corrupted extent lists in memory and this will result in panics or assert failures. Update the previous pointer correctly when we move to the next extent to process. SGI-PV: 965631 SGI-Modid: xfs-linux-melb:xfs-kern:28773a Signed-off-by: David Chinner Signed-off-by: Vlad Apostolov Signed-off-by: Tim Shimmin commit 210c6f1caa451623e14a7cd71000d2c2e0d9cc43 Author: David Chinner Date: Thu May 24 15:26:51 2007 +1000 [XFS] Fix the transaction flags to make lazy superblock counters work. SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28653a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 92821e2ba4ae26887223326fb0b95cdab963b768 Author: David Chinner Date: Thu May 24 15:26:31 2007 +1000 [XFS] Lazy Superblock Counters When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an *on-disk format change*. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 3260f78ad6d5b788e78ea709d377f58e569bee41 Author: Andrew Morton Date: Thu May 24 15:25:42 2007 +1000 [XFS] Use generic shrinker interfaces in XFS. SGI-PV: 964986 SGI-Modid: xfs-linux-melb:xfs-kern:28642a Signed-Off-By: Andrew Morton Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 92dfe8d266eaf35a50607a0e0dcf525e1d367c80 Author: David Chinner Date: Thu May 24 15:22:19 2007 +1000 [XFS] Make hole punching at EOF atomic. If hole punching at EOF is done as two steps (i.e. truncate then extend) the file is in a transient state between the two steps where an application can see the incorrect file size. Punching a hole to EOF needs to be treated in teh same way as all other hole punching cases so that the file size is never seen to change. SGI-PV: 962012 SGI-Modid: xfs-linux-melb:xfs-kern:28641a Signed-off-by: David Chinner Signed-off-by: Vlad Apostolov Signed-off-by: Tim Shimmin commit 511105b3d7c2440ee84fc3f90d200569aac88162 Author: David Chinner Date: Thu May 24 15:21:57 2007 +1000 [XFS] Fix vmalloc leak on mount/unmount. When setting the length of the iclogbuf to write out we should just be changing the desired byte count rather completely reassociating the buffer memory with the buffer. Reassociating the buffer memory changes the apparent length of the buffer and hence when we free the buffer, we don't free all the vmap()d space we originally allocated. SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28640a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit ca165b88927e41ad18908d7b37f08ef81eae0bf8 Author: Christoph Hellwig Date: Thu May 24 15:21:11 2007 +1000 [XFS] Fix double free in xfs_buf_get_noaddr error handling path SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28639a Signed-off-by: Christoph Hellwig Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 3db296f341b5902c4f9317022ae5d4da2d59d598 Author: David Chinner Date: Mon May 14 18:24:16 2007 +1000 [XFS] Fix use-after-free during log unmount. Don't reference the log buffer after running the callbacks as the callback can trigger the log buffers to be freed during unmount. SGI-PV: 964545 SGI-Modid: xfs-linux-melb:xfs-kern:28567a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 40095b64f5da601a8ab61fbe4b40feb46830052e Author: David Chinner Date: Mon May 14 18:24:09 2007 +1000 [XFS] Sleeping with the ilock waiting for I/O completion is Bad. Recent fixes to the filesystem freezing code introduced a vn_iowait call in the middle of the sync code. Unfortunately, at the point where this call was added we are holding the ilock. The ilock is needed by I/O completion for unwritten extent conversion and now updating the file size. Hence I/o cannot complete if we hold the ilock while waiting for I/O completion. Fix up the bug and clean the code up around it. SGI-PV: 963674 SGI-Modid: xfs-linux-melb:xfs-kern:28566a Signed-off-by: David Chinner Signed-off-by: Christoph Hellwig Signed-off-by: Tim Shimmin commit 4cc929ee305c69573cb842aade059dbe2a93940c Author: Nathan Scott Date: Mon May 14 18:24:02 2007 +1000 [XFS] Don't grow filesystems past the size they can index. When growing a filesystem we don't check to see if the new size overflows the page cache index range, so we can do silly things like grow a filesystem page 16TB on a 32bit. Check new filesystem sizes against the limits the kernel can support. SGI-PV: 957886 SGI-Modid: xfs-linux-melb:xfs-kern:28563a Signed-Off-By: Nathan Scott Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 1fa40b01ae4d1b00e366d4949edcc230f5cd6d99 Author: Christoph Hellwig Date: Mon May 14 18:23:50 2007 +1000 [XFS] Only use refcounted pages for I/O Many block drivers (aoe, iscsi) really want refcountable pages in bios, which is what almost everyone send down. XFS unfortunately has a few places where it sends down buffers that may come from kmalloc, which breaks them. Fix the places that use kmalloc()d buffers. SGI-PV: 964546 SGI-Modid: xfs-linux-melb:xfs-kern:28562a Signed-Off-By: Christoph Hellwig Signed-off-by: David Chinner Signed-off-by: Tim Shimmin commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9 Author: Linus Torvalds Date: Fri Jul 13 16:53:18 2007 -0700 Revert "SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel" This reverts commit 9faf65fb6ee2b4e08325ba2d69e5ccf0c46453d0. It bit people like Michal Piotrowski: "My system is too secure, I can not login :)" because it changed how CONFIG_NETLABEL worked, and broke older SElinux policies. As a result, quoth James Morris: "Can you please revert this patch? We thought it only affected people running MLS, but it will affect others. Sorry for the hassle." Cc: James Morris Cc: Stephen Smalley Cc: Michal Piotrowski Cc: Paul Moore Signed-off-by: Linus Torvalds commit 16cefa8c3863721fd40445a1b34dea18cd16ccfe Merge: 4fbef20... d8558f9... Author: Linus Torvalds Date: Fri Jul 13 16:46:18 2007 -0700 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits) sunrpc: drop BKL around wrap and unwrap NFSv4: Make sure unlock is really an unlock when cancelling a lock NLM: fix source address of callback to client SUNRPC client: add interface for binding to a local address SUNRPC server: record the destination address of a request SUNRPC: cleanup transport creation argument passing NFSv4: Make the NFS state model work with the nosharedcache mount option NFS: Error when mounting the same filesystem with different options NFS: Add the mount option "nosharecache" NFS: Add support for mounting NFSv4 file systems with string options NFS: Add final pieces to support in-kernel mount option parsing NFS: Introduce generic mount client API NFS: Add enums and match tables for mount option parsing NFS: Improve debugging output in NFS in-kernel mount client NFS: Clean up in-kernel NFS mount NFS: Remake nfsroot_mount as a permanent part of NFS client SUNRPC: Add a convenient default for the hostname when calling rpc_create() SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name SUNRPC: Rename rpcb_getport_external routine SUNRPC: Allow rpcbind requests to be interrupted by a signal. ... commit 4fbef206daead133085fe33905f5e842d38fb8da Author: Jens Axboe Date: Fri Jul 13 22:42:20 2007 +0200 nfsd: fix nfsd_vfs_read() splice actor setup When nfsd was transitioned to use splice instead of sendfile() for data transfers, a line setting the page index was lost. Restore it, so that nfsd is functional when that path is used. Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds commit 4fd885170bf13841ada921495b7b00c4b9971cf9 Author: Thomas Gleixner Date: Fri Jul 13 21:43:55 2007 +0200 CFS: Fix missing digit off in wmult table Roman Zippel noticed another inconsistency of the wmult table. wmult[16] has a missing digit. Signed-off-by: Thomas Gleixner Signed-off-by: Linus Torvalds commit af09f1e4b3214569de93bc9309c35014e5c8a3d0 Merge: e030dbf... 9a60ddb... Author: Linus Torvalds Date: Fri Jul 13 16:06:30 2007 -0700 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq * master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix typos in powernow-k8 printk's. [CPUFREQ] Restore previously used governor on a hot-replugged CPU [CPUFREQ] bugfix cpufreq in combination with performance governor [CPUFREQ] powernow-k8 compile fix. [CPUFREQ] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI [CPUFREQ] Longhaul - Option to disable ACPI C3 support Fixed up arch/i386/kernel/cpu/cpufreq/powernow-k8.c due to revert that got fixed differently in the cpufreq branch. Signed-off-by: Linus Torvalds commit e030dbf91a87da7e8be3be3ca781558695bea683 Merge: 12a2296... 3039f07... Author: Linus Torvalds Date: Fri Jul 13 10:52:27 2007 -0700 Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop * 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits) ioatdma: add the unisys "i/oat" pci vendor/device id ARM: Add drivers/dma to arch/arm/Kconfig iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver iop13xx: surface the iop13xx adma units to the iop-adma driver dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines md: remove raid5 compute_block and compute_parity5 md: handle_stripe5 - request io processing in raid5_run_ops md: handle_stripe5 - add request/completion logic for async expand ops md: handle_stripe5 - add request/completion logic for async read ops md: handle_stripe5 - add request/completion logic for async check ops md: handle_stripe5 - add request/completion logic for async compute ops md: handle_stripe5 - add request/completion logic for async write ops md: common infrastructure for running operations with raid5_run_ops md: raid5_run_ops - run stripe operations outside sh->lock raid5: replace custom debug PRINTKs with standard pr_debug raid5: refactor handle_stripe5 and handle_stripe6 (v3) async_tx: add the async_tx api xor: make 'xor_blocks' a library routine for use with async_tx dmaengine: make clients responsible for managing channels dmaengine: refactor dmaengine around dma_async_tx_descriptor ... commit 12a22960549979c10a95cc97f8ec63b461c55692 Merge: 31c4ab4... 51a92c0... Author: Linus Torvalds Date: Fri Jul 13 10:51:07 2007 -0700 Merge branch 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block * 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block: splice: fix offset mangling with direct splicing (sendfile) security: revalidate rw permissions for sys_splice and sys_vmsplice relay: fixup kerneldoc comment relay: fix bogus cast in subbuf_splice_actor() commit 31c4ab430a448cfb13fc88779d8a870c7af9f72b Merge: 8b69ad0... f24ae12... Author: Linus Torvalds Date: Fri Jul 13 10:44:45 2007 -0700 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Workaround for a sparse warning in include/asm-mips/mach-tx4927/ioremap.h [MIPS] Make show_code static and add __user tag [MIPS] Workaround for a sparse warning in include/asm-mips/compat.h [MIPS] Add some __user tags [MIPS] math-emu minor cleanup [MIPS] Kill CONFIG_TX4927BUG_WORKAROUND [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_FB_XPERT98 [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1000_SRC_CLK [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1000_USE32K [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1XXX_PSC_SPI [CHAR] Delete leftovers of old Alchemy UART driver commit 8b69ad0e690eb5f38c23087247a12e5fde1baeff Author: Linus Torvalds Date: Fri Jul 13 10:43:52 2007 -0700 Revert "[CPUFREQ] powernow-k8: clarify number of cores." This reverts commit 904f7a3f042b5c6aa9e53ce83f2c9de5e33170ff. As noted by Peter Anvin: "It causes build failures on i386. Yet another case of unnecessary divergence between i386 and x86-64 I'm afraid..." Signed-off-by: Linus Torvalds commit aba2da66cfbf7790ad79d4dee95871127d5ddf5e Merge: 7732089... f787a50... Author: Linus Torvalds Date: Fri Jul 13 10:12:21 2007 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: [PATCH] sched: small topology.h cleanup [PATCH] sched: fix show_task()/show_tasks() output [PATCH] sched: remove stale version info from kernel/sched_debug.c [PATCH] sched: allow larger granularity [PATCH] sched: fix prio_to_wmult[] for nice 1 [ I re-did the commits to get rid of some bogus merge commit that Ingo had. - Linus ] Signed-off-by: Linus Torvalds commit f787a50306680c187cf2896a8017937c1bf6dc7e Author: Ingo Molnar Date: Wed Jul 11 21:21:47 2007 +0200 [PATCH] sched: small topology.h cleanup trivial cleanup: LOCAL_DISTANCE and REMOTE_DISTANCE are only used in topology.h and inside an #ifndef section - limit their existence to that #ifndef. Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds commit 4bd77321a833077c5c9ac7b9d284e261e4a8906e Author: Ingo Molnar Date: Wed Jul 11 21:21:47 2007 +0200 [PATCH] sched: fix show_task()/show_tasks() output fix show_task()/show_tasks() output: - there's no sibling info anymore - the fields were not aligned properly with the description - get rid of the lazy-TLB output: it's been quite some time since we last had a bug there, and when we had a bug it wasnt helped a bit by this debug output. Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds commit 45f384a64f0769bb9a3caf0516de88a629f48e61 Author: Ingo Molnar Date: Wed Jul 11 21:21:47 2007 +0200 [PATCH] sched: remove stale version info from kernel/sched_debug.c kernel/sched_debug.c referred to CFS -v20, but there's no CFS versioning needed within the upstream kernel. Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds commit a5968df8737eda477d9d1038f5428ebd4d0884e1 Author: Ingo Molnar Date: Wed Jul 11 21:21:47 2007 +0200 [PATCH] sched: allow larger granularity Allow granularity up to 100 msecs, instead of 10 msecs. (needed on larger boxes) Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds commit e127031f4f76dc367c5d2f9d883715730dd82f7d Author: Mike Galbraith Date: Wed Jul 11 21:21:47 2007 +0200 [PATCH] sched: fix prio_to_wmult[] for nice 1 There's a typo in the values in prio_to_wmult[] for nice level 1. While it did not cause bad CPU distribution, but caused more rescheduling between nice-0 and nice-1 tasks than necessary. Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds commit f24ae12b3eeb1b956b752d4d5907e311cfa95a1a Author: Atsushi Nemoto Date: Sat Jul 14 00:06:44 2007 +0900 [MIPS] Workaround for a sparse warning in include/asm-mips/mach-tx4927/ioremap.h include2/asm/mach-tx49xx/ioremap.h:39:52: warning: cast truncates bits from constant value (fff000000 becomes ff000000) Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit e1bb828906e54ffb7e8b358516158ffdcf9581b8 Author: Atsushi Nemoto Date: Fri Jul 13 23:51:46 2007 +0900 [MIPS] Make show_code static and add __user tag Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit 01bebc66793f2cc65104452dc319a8a99f005934 Author: Atsushi Nemoto Date: Fri Jul 13 23:51:38 2007 +0900 [MIPS] Workaround for a sparse warning in include/asm-mips/compat.h Cast to a __user pointer via "unsigned long" to get rid of this warning: include2/asm/compat.h:135:10: warning: cast adds address space to expression () Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit 5e0373b8e449b0c72495a6d8401c53f678b71988 Author: Atsushi Nemoto Date: Fri Jul 13 23:02:42 2007 +0900 [MIPS] Add some __user tags Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit e70dfc10b99ebffa1f464b1b9290df2589284f70 Author: Atsushi Nemoto Date: Fri Jul 13 23:02:29 2007 +0900 [MIPS] math-emu minor cleanup Declaring emulpc and contpc as "unsigned long" can get rid of some casts. This also get rid of some sparse warnings. Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit e50e1c744df20e704441a99eef7c6414e6920712 Author: Atsushi Nemoto Date: Fri Jul 13 02:00:26 2007 +0900 [MIPS] Kill CONFIG_TX4927BUG_WORKAROUND Kill workarounds for very early chip (perhaps pre-TX4927A). Signed-off-by: Atsushi Nemoto Signed-off-by: Ralf Baechle commit b58f4b7aaf5ddaf2fdc13dfeb3ce6e61d51c3ac5 Author: Ralf Baechle Date: Fri Jul 13 10:40:23 2007 +0100 [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_FB_XPERT98 Noticed by Robert P. J. Day (rpjday@mindspring.com). Signed-off-by: Ralf Baechle commit 85a882bc3553636930bef7773201955c0e2fcf99 Author: Ralf Baechle Date: Fri Jul 13 06:45:48 2007 +0100 [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1000_SRC_CLK Noticed by Robert P. J. Day (rpjday@mindspring.com). Signed-off-by: Ralf Baechle commit 8f597acab2742b7ae9a556613c389ffa914cdbbd Author: Ralf Baechle Date: Fri Jul 13 06:42:36 2007 +0100 [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1000_USE32K Noticed by Robert P. J. Day (rpjday@mindspring.com). Signed-off-by: Ralf Baechle commit 6fec2e1727049ce6a404f4af61461d860594d5db Author: Ralf Baechle Date: Fri Jul 13 06:33:09 2007 +0100 [MIPS] Alchemy: Remove code wrapped by dead symbol CONFIG_AU1XXX_PSC_SPI Noticed by Robert P. J. Day (rpjday@mindspring.com). Signed-off-by: Ralf Baechle commit 33f60da0dad0884256f888c19303e16c95d2086d Author: Ralf Baechle Date: Fri Jul 13 17:39:59 2007 +0100 [CHAR] Delete leftovers of old Alchemy UART driver Signed-off-by: Ralf Baechle commit 3039f0735a280b54c7364fbfe6a9287f7f0b510a Author: Dan Williams Date: Fri Jul 13 08:06:19 2007 -0700 ioatdma: add the unisys "i/oat" pci vendor/device id Cc: John Magolan Signed-off-by: Shannon Nelson Signed-off-by: Dan Williams commit 5816815f7850509ed51ab94eb4f644e405ccb865 Author: Dan Williams Date: Tue Jan 2 11:10:43 2007 -0700 ARM: Add drivers/dma to arch/arm/Kconfig Cc: Russell King Signed-off-by: Dan Williams commit 2492c845189a961a92d8537a44d233e8e1e45c6d Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver Adds the platform device definitions and the architecture specific support routines (i.e. register initialization and descriptor formats) for the iop-adma driver. Changelog: * add support for > 1k zero sum buffer sizes * added dma/aau platform devices to iq80321 and iq80332 setup * fixed the calculation in iop_desc_is_aligned * support xor buffer sizes larger than 16MB * fix places where software descriptors are assumed to be contiguous, only hardware descriptors are contiguous for up to a PAGE_SIZE buffer size * convert to async_tx * add interrupt support * add platform devices for 80219 boards * do not call platform register macros in driver code * remove switch() statements for compatible register offsets/layouts * change over to bitmap based capabilities * remove unnecessary ARM assembly statement * checkpatch.pl fixes * gpl v2 only correction * phys move to dma_async_tx_descriptor Cc: Russell King Signed-off-by: Dan Williams commit 39a8d7d13c113e4a98bfdfc45c7233188e4d715f Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 iop13xx: surface the iop13xx adma units to the iop-adma driver Adds the platform device definitions and the architecture specific support routines (i.e. register initialization and descriptor formats) for the iop-adma driver. Changelog: * added 'descriptor pool size' to the platform data * add base support for buffer sizes larger than 16MB (hw max) * build error fix from Kirill A. Shutemov * rebase for async_tx changes * add interrupt support * do not call platform register macros in driver code * remove unnecessary ARM assembly statement * checkpatch.pl fixes * gpl v2 only correction Cc: Russell King Signed-off-by: Dan Williams commit c211092313b90f898dec61f35207fc282d1eadc3 Author: Dan Williams Date: Tue Jan 2 13:52:26 2007 -0700 dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines The Intel(R) IOP series of i/o processors integrate an Xscale core with raid acceleration engines. The capabilities per platform are: iop219: (2) copy engines iop321: (2) copy engines (1) xor and block fill engine iop33x: (2) copy and crc32c engines (1) xor, xor zero sum, pq, pq zero sum, and block fill engine iop34x (iop13xx): (2) copy, crc32c, xor, xor zero sum, and block fill engines (1) copy, crc32c, xor, xor zero sum, pq, pq zero sum, and block fill engine The driver supports the features of the async_tx api: * asynchronous notification of operation completion * implicit (interupt triggered) handling of inter-channel transaction dependencies The driver adapts to the platform it is running by two methods. 1/ #include which defines the hardware specific iop_chan_* and iop_desc_* routines as a series of static inline functions 2/ The private platform data attached to the platform_device defines the capabilities of the channels 20070626: Callbacks are run in a tasklet. Given the recent discussion on LKML about killing tasklets in favor of workqueues I did a quick conversion of the driver. Raid5 resync performance dropped from 50MB/s to 30MB/s, so the tasklet implementation remains until a generic softirq interface is available. Changelog: * fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few slots to be requested eventually leading to data corruption * enabled the slot allocation routine to attempt to free slots before returning -ENOMEM * switched the cleanup routine to solely use the software chain and the status register to determine if a descriptor is complete. This is necessary to support other IOP engines that do not have status writeback capability * make the driver iop generic * modified the allocation routines to understand allocating a group of slots for a single operation * added a null xor initialization operation for the xor only channel on iop3xx * support xor operations on buffers larger than the hardware maximum * split the do_* routines into separate prep, src/dest set, submit stages * added async_tx support (dependent operations initiation at cleanup time) * simplified group handling * added interrupt support (callbacks via tasklets) * brought the pending depth inline with ioat (i.e. 4 descriptors) * drop dma mapping methods, suggested by Chris Leech * don't use inline in C files, Adrian Bunk * remove static tasklet declarations * make iop_adma_alloc_slots easier to read and remove chances for a corrupted descriptor chain * fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt * convert capabilities over to dma_cap_mask_t * fixup sparse warnings * add descriptor flush before iop_chan_enable * checkpatch.pl fixes * gpl v2 only correction * move set_src, set_dest, submit to async_tx methods * move group_list and phys to async_tx Cc: Russell King Signed-off-by: Dan Williams commit f6dff381af01006ffae3c23cd2e07e30584de0ec Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 md: remove raid5 compute_block and compute_parity5 replaced by raid5_run_ops Signed-off-by: Dan Williams Acked-By: NeilBrown commit 830ea01673a397798d1281d2022615559f5001bb Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 md: handle_stripe5 - request io processing in raid5_run_ops I/O submission requests were already handled outside of the stripe lock in handle_stripe. Now that handle_stripe is only tasked with finding work, this logic belongs in raid5_run_ops. Signed-off-by: Dan Williams Acked-By: NeilBrown commit f0a50d3754c7f1b7f05f45b1c0b35d20445316b5 Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 md: handle_stripe5 - add request/completion logic for async expand ops When a stripe is being expanded bulk copying takes place to move the data from the old stripe to the new. Since raid5_run_ops only operates on one stripe at a time these bulk copies are handled in-line under the stripe lock. In the dma offload case we poll for the completion of the operation. After the data has been copied into the new stripe the parity needs to be recalculated across the new disks. We reuse the existing postxor functionality to carry out this calculation. By setting STRIPE_OP_POSTXOR without setting STRIPE_OP_BIODRAIN the completion path in handle stripe can differentiate expand operations from normal write operations. Signed-off-by: Dan Williams Acked-By: NeilBrown commit b5e98d65d34a1c11a2135ea8a9b2619dbc7216c8 Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 md: handle_stripe5 - add request/completion logic for async read ops When a read bio is attached to the stripe and the corresponding block is marked R5_UPTODATE, then a read (biofill) operation is scheduled to copy the data from the stripe cache to the bio buffer. handle_stripe flags the blocks to be operated on with the R5_Wantfill flag. If new read requests arrive while raid5_run_ops is running they will not be handled until handle_stripe is scheduled to run again. Changelog: * cleanup to_read and to_fill accounting * do not fail reads that have reached the cache Signed-off-by: Dan Williams Acked-By: NeilBrown commit e89f89629b5de76e504d1be75c82c4a6b2419583 Author: Dan Williams Date: Tue Jan 2 13:52:31 2007 -0700 md: handle_stripe5 - add request/completion logic for async check ops Check operations are scheduled when the array is being resynced or an explicit 'check/repair' command was sent to the array. Previously check operations would destroy the parity block in the cache such that even if parity turned out to be correct the parity block would be marked !R5_UPTODATE at the completion of the check. When the operation can be carried out by a dma engine the assumption is that it can check parity as a read-only operation. If raid5_run_ops notices that the check was handled by hardware it will preserve the R5_UPTODATE status of the parity disk. When a check operation determines that the parity needs to be repaired we reuse the existing compute block infrastructure to carry out the operation. Repair operations imply an immediate write back of the data, so to differentiate a repair from a normal compute operation the STRIPE_OP_MOD_REPAIR_PD flag is added. Changelog: * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams Acked-By: NeilBrown commit f38e12199a94ca458e4f03c5a2c984fb80adadc5 Author: Dan Williams Date: Tue Jan 2 13:52:30 2007 -0700 md: handle_stripe5 - add request/completion logic for async compute ops handle_stripe will compute a block when a backing disk has failed, or when it determines it can save a disk read by computing the block from all the other up-to-date blocks. Previously a block would be computed under the lock and subsequent logic in handle_stripe could use the newly up-to-date block. With the raid5_run_ops implementation the compute operation is carried out a later time outside the lock. To preserve the old functionality we take advantage of the dependency chain feature of async_tx to flag the block as R5_Wantcompute and then let other parts of handle_stripe operate on the block as if it were up-to-date. raid5_run_ops guarantees that the block will be ready before it is used in another operation. However, this only works in cases where the compute and the dependent operation are scheduled at the same time. If a previous call to handle_stripe sets the R5_Wantcompute flag there is no facility to pass the async_tx dependency chain across successive calls to raid5_run_ops. The req_compute variable protects against this case. Changelog: * remove the req_compute BUG_ON Signed-off-by: Dan Williams Acked-By: NeilBrown commit e33129d84130459dbb764a1a52a4bfceab3da978 Author: Dan Williams Date: Tue Jan 2 13:52:30 2007 -0700 md: handle_stripe5 - add request/completion logic for async write ops After handle_stripe5 decides whether it wants to perform a read-modify-write, or a reconstruct write it calls handle_write_operations5. A read-modify-write operation will perform an xor subtraction of the blocks marked with the R5_Wantprexor flag, copy the new data into the stripe (biodrain) and perform a postxor operation across all up-to-date blocks to generate the new parity. A reconstruct write is run when all blocks are already up-to-date in the cache so all that is needed is a biodrain and postxor. On the completion path STRIPE_OP_PREXOR will be set if the operation was a read-modify-write. The STRIPE_OP_BIODRAIN flag is used in the completion path to differentiate write-initiated postxor operations versus expansion-initiated postxor operations. Completion of a write triggers i/o to the drives. Changelog: * make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil Brown * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams Acked-By: NeilBrown commit d84e0f10d38393f617227f0c831a99c69294651f Author: Dan Williams Date: Tue Jan 2 13:52:30 2007 -0700 md: common infrastructure for running operations with raid5_run_ops All the handle_stripe operations that are to be transitioned to use raid5_run_ops need a method to coherently gather work under the stripe-lock and hand that work off to raid5_run_ops. The 'get_stripe_work' routine runs under the lock to read all the bits in sh->ops.pending that do not have the corresponding bit set in sh->ops.ack. This modified 'pending' bitmap is then passed to raid5_run_ops for processing. The transition from 'ack' to 'completion' does not need similar protection as the existing release_stripe infrastructure will guarantee that handle_stripe will run again after a completion bit is set, and handle_stripe can tolerate a sh->ops.completed bit being set while the lock is held. A call to async_tx_issue_pending_all() is added to raid5d to kick the offload engines once all pending stripe operations work has been submitted. This enables batching of the submission and completion of operations. Signed-off-by: Dan Williams Acked-By: NeilBrown commit 91c00924846a0034020451c280c76baa4299f9dc Author: Dan Williams Date: Tue Jan 2 13:52:30 2007 -0700 md: raid5_run_ops - run stripe operations outside sh->lock When the raid acceleration work was proposed, Neil laid out the following attack plan: 1/ move the xor and copy operations outside spin_lock(&sh->lock) 2/ find/implement an asynchronous offload api The raid5_run_ops routine uses the asynchronous offload api (async_tx) and the stripe_operations member of a stripe_head to carry out xor+copy operations asynchronously, outside the lock. To perform operations outside the lock a new set of state flags is needed to track new requests, in-flight requests, and completed requests. In this new model handle_stripe is tasked with scanning the stripe_head for work, updating the stripe_operations structure, and finally dropping the lock and calling raid5_run_ops for processing. The following flags outline the requests that handle_stripe can make of raid5_run_ops: STRIPE_OP_BIOFILL - copy data into request buffers to satisfy a read request STRIPE_OP_COMPUTE_BLK - generate a missing block in the cache from the other blocks STRIPE_OP_PREXOR - subtract existing data as part of the read-modify-write process STRIPE_OP_BIODRAIN - copy data out of request buffers to satisfy a write request STRIPE_OP_POSTXOR - recalculate parity for new data that has entered the cache STRIPE_OP_CHECK - verify that the parity is correct STRIPE_OP_IO - submit i/o to the member disks (note this was already performed outside the stripe lock, but it made sense to add it as an operation type The flow is: 1/ handle_stripe sets STRIPE_OP_* in sh->ops.pending 2/ raid5_run_ops reads sh->ops.pending, sets sh->ops.ack, and submits the operation to the async_tx api 3/ async_tx triggers the completion callback routine to set sh->ops.complete and release the stripe 4/ handle_stripe runs again to finish the operation and optionally submit new operations that were previously blocked Note this patch just defines raid5_run_ops, subsequent commits (one per major operation type) modify handle_stripe to take advantage of this routine. Changelog: * removed ops_complete_biodrain in favor of ops_complete_postxor and ops_complete_write. * removed the raid5_run_ops workqueue * call bi_end_io for reads in ops_complete_biofill, saves a call to handle_stripe * explicitly handle the 2-disk raid5 case (xor becomes memcpy), Neil Brown * fix race between async engines and bi_end_io call for reads, Neil Brown * remove unnecessary spin_lock from ops_complete_biofill * remove test_and_set/test_and_clear BUG_ONs, Neil Brown * remove explicit interrupt handling for channel switching, this feature was absorbed (i.e. it is now implicit) by the async_tx api * use return_io in ops_complete_biofill Signed-off-by: Dan Williams Acked-By: NeilBrown commit 45b4233caac05da0118b608a9fc2a40a9fc580cd Author: Dan Williams Date: Mon Jul 9 11:56:43 2007 -0700 raid5: replace custom debug PRINTKs with standard pr_debug Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in favor of the global DEBUG definition. To get local debug messages just add '#define DEBUG' to the top of the file. Signed-off-by: Dan Williams Acked-By: NeilBrown commit a445685647e825c713175d180ffc8dd54d90589b Author: Dan Williams Date: Mon Jul 9 11:56:43 2007 -0700 raid5: refactor handle_stripe5 and handle_stripe6 (v3) handle_stripe5 and handle_stripe6 have very deep logic paths handling the various states of a stripe_head. By introducing the 'stripe_head_state' and 'r6_state' objects, large portions of the logic can be moved to sub-routines. 'struct stripe_head_state' consumes all of the automatic variables that previously stood alone in handle_stripe5,6. 'struct r6_state' contains the handle_stripe6 specific variables like p_failed and q_failed. One of the nice side effects of the 'stripe_head_state' change is that it allows for further reductions in code duplication between raid5 and raid6. The following new routines are shared between raid5 and raid6: handle_completed_write_requests handle_requests_to_failed_array handle_stripe_expansion Changes: * v2: fixed 'conf->raid_disk-1' for the raid6 'handle_stripe_expansion' path * v3: removed the unused 'dirty' field from struct stripe_head_state * v3: coalesced open coded bi_end_io routines into return_io() Signed-off-by: Dan Williams Acked-By: NeilBrown commit 9bc89cd82d6f88fb0ca39b30445c329a430fd66b Author: Dan Williams Date: Tue Jan 2 11:10:44 2007 -0700 async_tx: add the async_tx api The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, ); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_(chan, len, int_en) : NULL; if (tx) { /* run asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run synchronously */ ... ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu Signed-off-by: Dan Williams Acked-By: NeilBrown commit 685784aaf3cd0e3ff5e36c7ecf6f441cdbf57f73 Author: Dan Williams Date: Mon Jul 9 11:56:42 2007 -0700 xor: make 'xor_blocks' a library routine for use with async_tx The async_tx api tries to use a dma engine for an operation, but will fall back to an optimized software routine otherwise. Xor support is implemented using the raid5 xor routines. For organizational purposes this routine is moved to a common area. The following fixes are also made: * rename xor_block => xor_blocks, suggested by Adrian Bunk * ensure that xor.o initializes before md.o in the built-in case * checkpatch.pl fixes * mark calibrate_xor_blocks __init, Adrian Bunk Cc: Adrian Bunk Cc: NeilBrown Cc: Herbert Xu Signed-off-by: Dan Williams commit d379b01e9087a582d58f4b678208a4f8d8376fe7 Author: Dan Williams Date: Mon Jul 9 11:56:42 2007 -0700 dmaengine: make clients responsible for managing channels The current implementation assumes that a channel will only be used by one client at a time. In order to enable channel sharing the dmaengine core is changed to a model where clients subscribe to channel-available-events. Instead of tracking how many channels a client wants and how many it has received the core just broadcasts the available channels and lets the clients optionally take a reference. The core learns about the clients' needs at dma_event_callback time. In support of multiple operation types, clients can specify a capability mask to only be notified of channels that satisfy a certain set of capabilities. Changelog: * removed DMA_TX_ARRAY_INIT, no longer needed * dma_client_chan_free -> dma_chan_release: switch to global reference counting only at device unregistration time, before it was also happening at client unregistration time * clients now return dma_state_client to dmaengine (ack, dup, nak) * checkpatch.pl fixes * fixup merge with git-ioat Cc: Chris Leech Signed-off-by: Shannon Nelson Signed-off-by: Dan Williams Acked-by: David S. Miller commit 7405f74badf46b5d023c5d2b670b4471525f6c91 Author: Dan Williams Date: Tue Jan 2 11:10:43 2007 -0700 dmaengine: refactor dmaengine around dma_async_tx_descriptor The current dmaengine interface defines mutliple routines per operation, i.e. dma_async_memcpy_buf_to_buf, dma_async_memcpy_buf_to_page etc. Adding more operation types (xor, crc, etc) to this model would result in an unmanageable number of method permutations. Are we really going to add a set of hooks for each DMA engine whizbang feature? - Jeff Garzik The descriptor creation process is refactored using the new common dma_async_tx_descriptor structure. Instead of per driver do___to_ methods, drivers integrate dma_async_tx_descriptor into their private software descriptor and then define a 'prep' routine per operation. The prep routine allocates a descriptor and ensures that the tx_set_src, tx_set_dest, tx_submit routines are valid. Descriptor creation and submission becomes: struct dma_device *dev; struct dma_chan *chan; struct dma_async_tx_descriptor *tx; tx = dev->device_prep_dma_(chan, len, int_flag) tx->tx_set_src(dma_addr_t, tx, index /* for multi-source ops */) tx->tx_set_dest(dma_addr_t, tx, index) tx->tx_submit(tx) In addition to the refactoring, dma_async_tx_descriptor also lays the groundwork for definining cross-channel-operation dependencies, and a callback facility for asynchronous notification of operation completion. Changelog: * drop dma mapping methods, suggested by Chris Leech * fix ioat_dma_dependency_added, also caught by Andrew Morton * fix dma_sync_wait, change from Andrew Morton * uninline large functions, change from Andrew Morton * add tx->callback = NULL to dmaengine calls to interoperate with async_tx calls * hookup ioat_tx_submit * convert channel capabilities to a 'cpumask_t like' bitmap * removed DMA_TX_ARRAY_INIT, no longer needed * checkpatch.pl fixes * make set_src, set_dest, and tx_submit descriptor specific methods * fixup git-ioat merge * move group_list and phys to dma_async_tx_descriptor Cc: Jeff Garzik Cc: Chris Leech Signed-off-by: Shannon Nelson Signed-off-by: Dan Williams Acked-by: David S. Miller commit 51a92c0f6ce8fa85fa0e18ecda1d847e606e8066 Author: Jens Axboe Date: Fri Jul 13 14:11:43 2007 +0200 splice: fix offset mangling with direct splicing (sendfile) If the output actor doesn't transfer the full amount of data, we will increment ppos too much. Two related bugs in there: - We need to break out and return actor() retval if it is shorted than what we spliced into the pipe. - Adjust ppos only according to actor() return. Also fix loop problem in generic_file_splice_read(), it should not keep going when data has already been transferred. Signed-off-by: Jens Axboe commit 29ce20586be54ceba49c55ae049541398cd2c416 Author: James Morris Date: Fri Jul 13 11:44:32 2007 +0200 security: revalidate rw permissions for sys_splice and sys_vmsplice Revalidate read/write permissions for splice(2) and vmslice(2), in case security policy has changed since the files were opened. Acked-by: Stephen Smalley Signed-off-by: James Morris Signed-off-by: Jens Axboe commit d3f35d98b3b87d2506289320375687c6e9bc53ed Author: Tom Zanussi Date: Thu Jul 12 08:12:05 2007 +0200 relay: fixup kerneldoc comment Change comment from kerneldoc to normal. Signed-off-by: Tom Zanussi Signed-off-by: Jens Axboe commit 24da24de2eae0c277b85836e2b4b09cfafeea995 Author: Tom Zanussi Date: Thu Jul 12 08:12:04 2007 +0200 relay: fix bogus cast in subbuf_splice_actor() The current code that sets the read position in subbuf_splice_actor may give erroneous results if the buffer size isn't a power of 2. This patch fixes the problem. Signed-off-by: Tom Zanussi Signed-off-by: Jens Axboe commit 9a60ddbcb710ff78cd8c772681723a04e3f5aba3 Author: Dave Jones Date: Fri Jul 13 01:34:10 2007 -0400 [CPUFREQ] Fix typos in powernow-k8 printk's. Based on a patch from Joachim which didn't apply, so I fixed it up by hand, and also corrected the surrounding indentation a little. Signed-off-by: Joachim.Deguara Acked-by: Mark Langsdorf Signed-off-by: Dave Jones commit 084f34939424161669467c19280dbcf637730314 Author: Thomas Renninger Date: Mon Jul 9 11:35:28 2007 -0700 [CPUFREQ] Restore previously use