commit 0a9d91dca3b9f797f2fc615486c12afa59f19a3b Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Thu Aug 7 13:54:57 2014 -0700 Linux 3.4.102 commit a066704c52563ec7300dd7fe8cdc9dad23a3e654 Author: Gao feng <gaofeng@cn.fujitsu.com> Date: Fri Jan 24 16:29:11 2014 +0800 ipv6: reallocate addrconf router for ipv6 address when lo device up commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 upstream. commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" allocates addrconf router for ipv6 address when lo device up. but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks this behavior. Since the addrconf router is moved to the garbage list when lo device down, we should release this router and rellocate a new one for ipv6 address when lo device up. This patch solves bug 67951 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=67951 change from v1: use ip6_rt_put to repleace ip6_del_rt, thanks Hannes! change code style, suggested by Sergei. CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [weilong: s/ip6_rt_put/dst_release] Signed-off-by: Chen Weilong <chenweilong@huawei.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 21dbd45fb3355f60ac73371e4612a3aa26d07124 Author: Vlastimil Babka <vbabka@suse.cz> Date: Mon Apr 7 15:37:50 2014 -0700 mm: try_to_unmap_cluster() should lock_page() before mlocking commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c upstream. A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 9c98268072b611fab39eee633e3f2e7b2d8c64da Author: Boris Ostrovsky <boris.ostrovsky@oracle.com> Date: Wed Jul 9 13:18:18 2014 -0400 x86/espfix/xen: Fix allocation of pages for paravirt page tables commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit cca69958eee770a0b618bc73fec41b32d2b50387 Author: Minfei Huang <huangminfei@ucloud.cn> Date: Wed Jun 4 16:11:53 2014 -0700 lib/btree.c: fix leak of whole btree nodes commit c75b53af2f0043aff500af0a6f878497bef41bca upstream. I use btree from 3.14-rc2 in my own module. When the btree module is removed, a warning arises: kmem_cache_destroy btree_node: Slab cache still has objects CPU: 13 PID: 9150 Comm: rmmod Tainted: GF O 3.14.0-rc2 #1 Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013 Call Trace: dump_stack+0x49/0x5d kmem_cache_destroy+0xcf/0xe0 btree_module_exit+0x10/0x12 [btree] SyS_delete_module+0x198/0x1f0 system_call_fastpath+0x16/0x1b The cause is that it doesn't release the last btree node, when height = 1 and fill = 1. [akpm@linux-foundation.org: remove unneeded test of NULL] Signed-off-by: Minfei Huang <huangminfei@ucloud.cn> Cc: Joern Engel <joern@logfs.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 56a2a6f68b6cf7dec64a53c8c7497fa5b36cbb15 Author: Sasha Levin <sasha.levin@oracle.com> Date: Mon Jul 14 17:02:31 2014 -0700 net/l2tp: don't fall back on UDP [get|set]sockopt commit 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf upstream. The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: James Chapman <jchapman@katalix.com> Acked-by: David Miller <davem@davemloft.net> Cc: Phil Turnbull <phil.turnbull@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 68199d62034a736cae12fee49b0dad5342612583 Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Mon Aug 4 21:42:10 2014 -0700 Revert: "net: ip, ipv6: handle gso skbs in forwarding path" This reverts commit 29a3cd46644ec8098dbe1c12f89643b5c11831a9 which is commit fe6cc55f3a9a053482a76f5a6b2257cee51b4663 upstream. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 473d8a237b359c41225458f3d7c6aa2e5beb40d5 Author: Andy Lutomirski <luto@amacapital.net> Date: Wed Jul 23 08:34:11 2014 -0700 x86_64/entry/xen: Do not invoke espfix64 on Xen commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 4a0db8af65056ec1a60607d4286b782934a2a03b Author: H. Peter Anvin <hpa@zytor.com> Date: Sun May 4 10:36:22 2014 -0700 x86, espfix: Make it possible to disable 16-bit support commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7b6354ea6bfbe20f68c248ee5a846b1d3f76863d Author: H. Peter Anvin <hpa@zytor.com> Date: Sun May 4 10:00:49 2014 -0700 x86, espfix: Make espfix64 a Kconfig option, fix UML commit 197725de65477bc8509b41388157c1a2283542bb upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 111f0ce4598291dcd2d354a20ab94de834c1e109 Author: H. Peter Anvin <hpa@linux.intel.com> Date: Fri May 2 11:33:51 2014 -0700 x86, espfix: Fix broken header guard commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 18e63ea35d8424f9b99ad4c53cd51f777ea17fca Author: H. Peter Anvin <hpa@linux.intel.com> Date: Thu May 1 14:12:23 2014 -0700 x86, espfix: Move espfix definitions into a separate header file commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 3989298cbdafdaf0536f23677df9f29bfc44f8ad Author: H. Peter Anvin <hpa@linux.intel.com> Date: Tue Apr 29 16:46:09 2014 -0700 x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 6c7e43ada5875cb1592046242f22f2fa54b09ca0 Author: H. Peter Anvin <hpa@zytor.com> Date: Wed May 21 10:22:59 2014 -0700 Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream. This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit fbbb7208969e8bfbd07782bbec069878a29d3267 Author: Jan Kara <jack@suse.cz> Date: Fri Aug 1 12:20:02 2014 +0200 timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks commit 504d58745c9ca28d33572e2d8a9990b43e06075d upstream. clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit f73ff697833654ee578b606ea746d15dc1220aab Author: John Stultz <john.stultz@linaro.org> Date: Wed Jun 4 16:11:40 2014 -0700 printk: rename printk_sched to printk_deferred commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream. After learning we'll need some sort of deferred printk functionality in the timekeeping core, Peter suggested we rename the printk_sched function so it can be reused by needed subsystems. This only changes the function name. No logic changes. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7f6c1deb02e6ac110645c87fd2446594803a8a72 Author: David Rientjes <rientjes@google.com> Date: Wed Jul 30 16:08:24 2014 -0700 mm, thp: do not allow thp faults to avoid cpuset restrictions commit b104a35d32025ca740539db2808aa3385d0f30eb upstream. The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET should be set in allocflags. ALLOC_CPUSET controls if a page allocation should be restricted only to the set of allowed cpuset mems. Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent the fault path from using memory compaction or direct reclaim. Thus, it is unfairly able to allocate outside of its cpuset mems restriction as a side-effect. This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is truly GFP_ATOMIC by verifying it is also not a thp allocation. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Alex Thorlton <athorlton@sgi.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Bob Liu <lliubbo@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 0e04ec4d3c1ada2049c9b852ab63fba416c6df8d Author: James Bottomley <JBottomley@Parallels.com> Date: Thu Jul 3 19:17:34 2014 +0200 scsi: handle flush errors properly commit 89fb4cd1f717a871ef79fa7debbe840e3225cd54 upstream. Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Reported-by: Steven Haber <steven@qumulo.com> Tested-by: Steven Haber <steven@qumulo.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 2e926a6b962cfc154818fd0796c4815df6ed09d0 Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Date: Fri Jul 25 09:17:12 2014 +0100 ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout commit 811a2407a3cf7bbd027fbe92d73416f17485a3d8 upstream. On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2 (pmd) entries map 2MiB. When the identity mapping is created on LPAE, the pgd pointers are copied from the swapper_pg_dir. If we find that we need to modify the contents of a pmd, we allocate a new empty pmd table and insert it into the appropriate 1GB slot, before then filling it with the identity mapping. However, if the 1GB slot covers the kernel lowmem mappings, we obliterate those mappings. When replacing a PMD, first copy the old PMD contents to the new PMD, so that we preserve the existing mappings, particularly the mappings of the kernel itself. [rewrote commit message and added code comment -- rmk] Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format") Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 44ce8f87254b3aea12296aa8275b76fa86c07ca6 Author: Milan Broz <gmazyland@gmail.com> Date: Tue Jul 29 18:41:09 2014 +0000 crypto: af_alg - properly label AF_ALG socket commit 4c63f83c2c2e16a13ce274ee678e28246bd33645 upstream. Th AF_ALG socket was missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This was recently demonstrated in the cryptsetup package (cryptsetup v1.6.5 and later.) See https://bugzilla.redhat.com/show_bug.cgi?id=1115120 This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Signed-off-by: Milan Broz <gmazyland@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>