author | jpk |
Fri, 24 Mar 2006 12:29:20 -0800 | |
changeset 1676 | 37f4a3e2bd99 |
parent 1414 | b4126407ac5b |
child 5084 | 7d838c5c0eed |
permissions | -rw-r--r-- |
0 | 1 |
/* |
2 |
* CDDL HEADER START |
|
3 |
* |
|
4 |
* The contents of this file are subject to the terms of the |
|
5 |
* Common Development and Distribution License, Version 1.0 only |
|
6 |
* (the "License"). You may not use this file except in compliance |
|
7 |
* with the License. |
|
8 |
* |
|
9 |
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE |
|
10 |
* or http://www.opensolaris.org/os/licensing. |
|
11 |
* See the License for the specific language governing permissions |
|
12 |
* and limitations under the License. |
|
13 |
* |
|
14 |
* When distributing Covered Code, include this CDDL HEADER in each |
|
15 |
* file and include the License file at usr/src/OPENSOLARIS.LICENSE. |
|
16 |
* If applicable, add the following below this CDDL HEADER, with the |
|
17 |
* fields enclosed by brackets "[]" replaced with your own identifying |
|
18 |
* information: Portions Copyright [yyyy] [name of copyright owner] |
|
19 |
* |
|
20 |
* CDDL HEADER END |
|
21 |
*/ |
|
22 |
/* |
|
1414
b4126407ac5b
PSARC 2006/020 FMA for Athlon 64 and Opteron Processors
cindi
parents:
136
diff
changeset
|
23 |
* Copyright 2006 Sun Microsystems, Inc. All rights reserved. |
0 | 24 |
* Use is subject to license terms. |
25 |
*/ |
|
26 |
||
27 |
#pragma ident "%Z%%M% %I% %E% SMI" |
|
28 |
||
29 |
/* |
|
30 |
* When the operating system detects that it is in an invalid state, a panic |
|
31 |
* is initiated in order to minimize potential damage to user data and to |
|
32 |
* facilitate debugging. There are three major tasks to be performed in |
|
33 |
* a system panic: recording information about the panic in memory (and thus |
|
34 |
* making it part of the crash dump), synchronizing the file systems to |
|
35 |
* preserve user file data, and generating the crash dump. We define the |
|
36 |
* system to be in one of four states with respect to the panic code: |
|
37 |
* |
|
38 |
* CALM - the state of the system prior to any thread initiating a panic |
|
39 |
* |
|
40 |
* QUIESCE - the state of the system when the first thread to initiate |
|
41 |
* a system panic records information about the cause of the panic |
|
42 |
* and renders the system quiescent by stopping other processors |
|
43 |
* |
|
44 |
* SYNC - the state of the system when we synchronize the file systems |
|
45 |
* DUMP - the state when we generate the crash dump. |
|
46 |
* |
|
47 |
* The transitions between these states are irreversible: once we begin |
|
48 |
* panicking, we only make one attempt to perform the actions associated with |
|
49 |
* each state. |
|
50 |
* |
|
51 |
* The panic code itself must be re-entrant because actions taken during any |
|
52 |
* state may lead to another system panic. Additionally, any Solaris |
|
53 |
* thread may initiate a panic at any time, and so we must have synchronization |
|
54 |
* between threads which attempt to initiate a state transition simultaneously. |
|
55 |
* The panic code makes use of a special locking primitive, a trigger, to |
|
56 |
* perform this synchronization. A trigger is simply a word which is set |
|
57 |
* atomically and can only be set once. We declare three triggers, one for |
|
58 |
* each transition between the four states. When a thread enters the panic |
|
59 |
* code it attempts to set each trigger; if it fails it moves on to the |
|
60 |
* next trigger. A special case is the first trigger: if two threads race |
|
61 |
* to perform the transition to QUIESCE, the losing thread may execute before |
|
62 |
* the winner has a chance to stop its CPU. To solve this problem, we have |
|
63 |
* the loser look ahead to see if any other triggers are set; if not, it |
|
64 |
* presumes a panic is underway and simply spins. Unfortunately, since we |
|
65 |
* are panicking, it is not possible to know this with absolute certainty. |
|
66 |
* |
|
67 |
* There are two common reasons for re-entering the panic code once a panic |
|
68 |
* has been initiated: (1) after we debug_enter() at the end of QUIESCE, |
|
69 |
* the operator may type "sync" instead of "go", and the PROM's sync callback |
|
70 |
* routine will invoke panic(); (2) if the clock routine decides that sync |
|
71 |
* or dump is not making progress, it will invoke panic() to force a timeout. |
|
72 |
* The design assumes that a third possibility, another thread causing an |
|
73 |
* unrelated panic while sync or dump is still underway, is extremely unlikely. |
|
74 |
* If this situation occurs, we may end up triggering dump while sync is |
|
75 |
* still in progress. This third case is considered extremely unlikely because |
|
76 |
* all other CPUs are stopped and low-level interrupts have been blocked. |
|
77 |
* |
|
78 |
* The panic code is entered via a call directly to the vpanic() function, |
|
79 |
* or its varargs wrappers panic() and cmn_err(9F). The vpanic routine |
|
80 |
* is implemented in assembly language to record the current machine |
|
81 |
* registers, attempt to set the trigger for the QUIESCE state, and |
|
82 |
* if successful, switch stacks on to the panic_stack before calling into |
|
83 |
* the common panicsys() routine. The first thread to initiate a panic |
|
84 |
* is allowed to make use of the reserved panic_stack so that executing |
|
85 |
* the panic code itself does not overwrite valuable data on that thread's |
|
86 |
* stack *ahead* of the current stack pointer. This data will be preserved |
|
87 |
* in the crash dump and may prove invaluable in determining what this |
|
88 |
* thread has previously been doing. The first thread, saved in panic_thread, |
|
89 |
* is also responsible for stopping the other CPUs as quickly as possible, |
|
90 |
* and then setting the various panic_* variables. Most important among |
|
91 |
* these is panicstr, which allows threads to subsequently bypass held |
|
92 |
* locks so that we can proceed without ever blocking. We must stop the |
|
93 |
* other CPUs *prior* to setting panicstr in case threads running there are |
|
94 |
* currently spinning to acquire a lock; we want that state to be preserved. |
|
95 |
* Every thread which initiates a panic has its T_PANIC flag set so we can |
|
96 |
* identify all such threads in the crash dump. |
|
97 |
* |
|
98 |
* The panic_thread is also allowed to make use of the special memory buffer |
|
99 |
* panicbuf, which on machines with appropriate hardware is preserved across |
|
100 |
* reboots. We allow the panic_thread to store its register set and panic |
|
101 |
* message in this buffer, so even if we fail to obtain a crash dump we will |
|
102 |
* be able to examine the machine after reboot and determine some of the |
|
103 |
* state at the time of the panic. If we do get a dump, the panic buffer |
|
104 |
* data is structured so that a debugger can easily consume the information |
|
105 |
* therein (see <sys/panic.h>). |
|
106 |
* |
|
107 |
* Each platform or architecture is required to implement the functions |
|
108 |
* panic_savetrap() to record trap-specific information to panicbuf, |
|
109 |
* panic_saveregs() to record a register set to panicbuf, panic_stopcpus() |
|
110 |
* to halt all CPUs but the panicking CPU, panic_quiesce_hw() to perform |
|
111 |
* miscellaneous platform-specific tasks *after* panicstr is set, |
|
112 |
* panic_showtrap() to print trap-specific information to the console, |
|
113 |
* and panic_dump_hw() to perform platform tasks prior to calling dumpsys(). |
|
114 |
* |
|
115 |
* A Note on Word Formation, courtesy of the Oxford Guide to English Usage: |
|
116 |
* |
|
117 |
* Words ending in -c interpose k before suffixes which otherwise would |
|
118 |
* indicate a soft c, and thus the verb and adjective forms of 'panic' are |
|
119 |
* spelled "panicked", "panicking", and "panicky" respectively. Use of |
|
120 |
* the ill-conceived "panicing" and "panic'd" is discouraged. |
|
121 |
*/ |
|
122 |
||
123 |
#include <sys/types.h> |
|
124 |
#include <sys/varargs.h> |
|
125 |
#include <sys/sysmacros.h> |
|
126 |
#include <sys/cmn_err.h> |
|
127 |
#include <sys/cpuvar.h> |
|
128 |
#include <sys/thread.h> |
|
129 |
#include <sys/t_lock.h> |
|
130 |
#include <sys/cred.h> |
|
131 |
#include <sys/systm.h> |
|
132 |
#include <sys/uadmin.h> |
|
133 |
#include <sys/callb.h> |
|
134 |
#include <sys/vfs.h> |
|
135 |
#include <sys/log.h> |
|
136 |
#include <sys/disp.h> |
|
137 |
#include <sys/param.h> |
|
138 |
#include <sys/dumphdr.h> |
|
139 |
#include <sys/ftrace.h> |
|
140 |
#include <sys/reboot.h> |
|
141 |
#include <sys/debug.h> |
|
142 |
#include <sys/stack.h> |
|
143 |
#include <sys/spl.h> |
|
144 |
#include <sys/errorq.h> |
|
145 |
#include <sys/panic.h> |
|
1414
b4126407ac5b
PSARC 2006/020 FMA for Athlon 64 and Opteron Processors
cindi
parents:
136
diff
changeset
|
146 |
#include <sys/fm/util.h> |
0 | 147 |
|
148 |
/* |
|
149 |
* Panic variables which are set once during the QUIESCE state by the |
|
150 |
* first thread to initiate a panic. These are examined by post-mortem |
|
151 |
* debugging tools; the inconsistent use of 'panic' versus 'panic_' in |
|
152 |
* the variable naming is historical and allows legacy tools to work. |
|
153 |
*/ |
|
154 |
#pragma align STACK_ALIGN(panic_stack) |
|
155 |
char panic_stack[PANICSTKSIZE]; /* reserved stack for panic_thread */ |
|
156 |
kthread_t *panic_thread; /* first thread to call panicsys() */ |
|
157 |
cpu_t panic_cpu; /* cpu from first call to panicsys() */ |
|
158 |
label_t panic_regs; /* setjmp label from panic_thread */ |
|
159 |
struct regs *panic_reg; /* regs struct from first panicsys() */ |
|
160 |
char *volatile panicstr; /* format string to first panicsys() */ |
|
161 |
va_list panicargs; /* arguments to first panicsys() */ |
|
162 |
clock_t panic_lbolt; /* lbolt at time of panic */ |
|
163 |
int64_t panic_lbolt64; /* lbolt64 at time of panic */ |
|
164 |
hrtime_t panic_hrtime; /* hrtime at time of panic */ |
|
165 |
timespec_t panic_hrestime; /* hrestime at time of panic */ |
|
166 |
int panic_ipl; /* ipl on panic_cpu at time of panic */ |
|
167 |
ushort_t panic_schedflag; /* t_schedflag for panic_thread */ |
|
168 |
cpu_t *panic_bound_cpu; /* t_bound_cpu for panic_thread */ |
|
169 |
char panic_preempt; /* t_preempt for panic_thread */ |
|
170 |
||
171 |
/* |
|
172 |
* Panic variables which can be set via /etc/system or patched while |
|
173 |
* the system is in operation. Again, the stupid names are historic. |
|
174 |
*/ |
|
175 |
char *panic_bootstr = NULL; /* mdboot string to use after panic */ |
|
176 |
int panic_bootfcn = AD_BOOT; /* mdboot function to use after panic */ |
|
177 |
int halt_on_panic = 0; /* halt after dump instead of reboot? */ |
|
178 |
int nopanicdebug = 0; /* reboot instead of call debugger? */ |
|
179 |
int in_sync = 0; /* skip vfs_syncall() and just dump? */ |
|
180 |
||
181 |
/* |
|
182 |
* The do_polled_io flag is set by the panic code to inform the SCSI subsystem |
|
183 |
* to use polled mode instead of interrupt-driven i/o. |
|
184 |
*/ |
|
185 |
int do_polled_io = 0; |
|
186 |
||
187 |
/* |
|
188 |
* The panic_forced flag is set by the uadmin A_DUMP code to inform the |
|
189 |
* panic subsystem that it should not attempt an initial debug_enter. |
|
190 |
*/ |
|
191 |
int panic_forced = 0; |
|
192 |
||
193 |
/* |
|
194 |
* Triggers for panic state transitions: |
|
195 |
*/ |
|
196 |
int panic_quiesce; /* trigger for CALM -> QUIESCE */ |
|
197 |
int panic_sync; /* trigger for QUIESCE -> SYNC */ |
|
198 |
int panic_dump; /* trigger for SYNC -> DUMP */ |
|
199 |
||
200 |
void |
|
201 |
panicsys(const char *format, va_list alist, struct regs *rp, int on_panic_stack) |
|
202 |
{ |
|
203 |
int s = spl8(); |
|
204 |
kthread_t *t = curthread; |
|
205 |
cpu_t *cp = CPU; |
|
206 |
||
207 |
caddr_t intr_stack = NULL; |
|
208 |
uint_t intr_actv; |
|
209 |
||
210 |
ushort_t schedflag = t->t_schedflag; |
|
211 |
cpu_t *bound_cpu = t->t_bound_cpu; |
|
212 |
char preempt = t->t_preempt; |
|
213 |
||
214 |
(void) setjmp(&t->t_pcb); |
|
215 |
t->t_flag |= T_PANIC; |
|
216 |
||
217 |
t->t_schedflag |= TS_DONT_SWAP; |
|
218 |
t->t_bound_cpu = cp; |
|
219 |
t->t_preempt++; |
|
220 |
||
221 |
panic_enter_hw(s); |
|
222 |
||
223 |
/* |
|
224 |
* If we're on the interrupt stack and an interrupt thread is available |
|
225 |
* in this CPU's pool, preserve the interrupt stack by detaching an |
|
226 |
* interrupt thread and making its stack the intr_stack. |
|
227 |
*/ |
|
228 |
if (CPU_ON_INTR(cp) && cp->cpu_intr_thread != NULL) { |
|
229 |
kthread_t *it = cp->cpu_intr_thread; |
|
230 |
||
231 |
intr_stack = cp->cpu_intr_stack; |
|
232 |
intr_actv = cp->cpu_intr_actv; |
|
233 |
||
234 |
cp->cpu_intr_stack = thread_stk_init(it->t_stk); |
|
235 |
cp->cpu_intr_thread = it->t_link; |
|
236 |
||
237 |
/* |
|
238 |
* Clear only the high level bits of cpu_intr_actv. |
|
239 |
* We want to indicate that high-level interrupts are |
|
240 |
* not active without destroying the low-level interrupt |
|
241 |
* information stored there. |
|
242 |
*/ |
|
243 |
cp->cpu_intr_actv &= ((1 << (LOCK_LEVEL + 1)) - 1); |
|
244 |
} |
|
245 |
||
246 |
/* |
|
247 |
* Record one-time panic information and quiesce the other CPUs. |
|
248 |
* Then print out the panic message and stack trace. |
|
249 |
*/ |
|
250 |
if (on_panic_stack) { |
|
251 |
panic_data_t *pdp = (panic_data_t *)panicbuf; |
|
252 |
||
253 |
pdp->pd_version = PANICBUFVERS; |
|
254 |
pdp->pd_msgoff = sizeof (panic_data_t) - sizeof (panic_nv_t); |
|
255 |
||
256 |
if (t->t_panic_trap != NULL) |
|
257 |
panic_savetrap(pdp, t->t_panic_trap); |
|
258 |
else |
|
259 |
panic_saveregs(pdp, rp); |
|
260 |
||
261 |
(void) vsnprintf(&panicbuf[pdp->pd_msgoff], |
|
262 |
PANICBUFSIZE - pdp->pd_msgoff, format, alist); |
|
263 |
||
264 |
/* |
|
265 |
* Call into the platform code to stop the other CPUs. |
|
266 |
* We currently have all interrupts blocked, and expect that |
|
267 |
* the platform code will lower ipl only as far as needed to |
|
268 |
* perform cross-calls, and will acquire as *few* locks as is |
|
269 |
* possible -- panicstr is not set so we can still deadlock. |
|
270 |
*/ |
|
271 |
panic_stopcpus(cp, t, s); |
|
272 |
||
273 |
panicstr = (char *)format; |
|
274 |
va_copy(panicargs, alist); |
|
275 |
panic_lbolt = lbolt; |
|
276 |
panic_lbolt64 = lbolt64; |
|
277 |
panic_hrestime = hrestime; |
|
278 |
panic_hrtime = gethrtime_waitfree(); |
|
279 |
panic_thread = t; |
|
280 |
panic_regs = t->t_pcb; |
|
281 |
panic_reg = rp; |
|
282 |
panic_cpu = *cp; |
|
283 |
panic_ipl = spltoipl(s); |
|
284 |
panic_schedflag = schedflag; |
|
285 |
panic_bound_cpu = bound_cpu; |
|
286 |
panic_preempt = preempt; |
|
287 |
||
288 |
if (intr_stack != NULL) { |
|
289 |
panic_cpu.cpu_intr_stack = intr_stack; |
|
290 |
panic_cpu.cpu_intr_actv = intr_actv; |
|
291 |
} |
|
292 |
||
293 |
/* |
|
294 |
* Lower ipl to 10 to keep clock() from running, but allow |
|
295 |
* keyboard interrupts to enter the debugger. These callbacks |
|
296 |
* are executed with panicstr set so they can bypass locks. |
|
297 |
*/ |
|
298 |
splx(ipltospl(CLOCK_LEVEL)); |
|
299 |
panic_quiesce_hw(pdp); |
|
300 |
(void) FTRACE_STOP(); |
|
301 |
(void) callb_execute_class(CB_CL_PANIC, NULL); |
|
302 |
||
303 |
if (log_intrq != NULL) |
|
304 |
log_flushq(log_intrq); |
|
305 |
||
306 |
/* |
|
307 |
* If log_consq has been initialized and syslogd has started, |
|
308 |
* print any messages in log_consq that haven't been consumed. |
|
309 |
*/ |
|
310 |
if (log_consq != NULL && log_consq != log_backlogq) |
|
311 |
log_printq(log_consq); |
|
312 |
||
313 |
fm_banner(); |
|
314 |
errorq_panic(); |
|
315 |
||
316 |
printf("\n\rpanic[cpu%d]/thread=%p: ", cp->cpu_id, (void *)t); |
|
317 |
vprintf(format, alist); |
|
318 |
printf("\n\n"); |
|
319 |
||
320 |
if (t->t_panic_trap != NULL) { |
|
321 |
panic_showtrap(t->t_panic_trap); |
|
322 |
printf("\n"); |
|
323 |
} |
|
324 |
||
325 |
traceregs(rp); |
|
326 |
printf("\n"); |
|
327 |
||
328 |
if (((boothowto & RB_DEBUG) || obpdebug) && |
|
329 |
!nopanicdebug && !panic_forced) { |
|
330 |
if (dumpvp != NULL) { |
|
331 |
debug_enter("panic: entering debugger " |
|
332 |
"(continue to save dump)"); |
|
333 |
} else { |
|
334 |
debug_enter("panic: entering debugger " |
|
335 |
"(no dump device, continue to reboot)"); |
|
336 |
} |
|
337 |
} |
|
338 |
||
339 |
} else if (panic_dump != 0 || panic_sync != 0 || panicstr != NULL) { |
|
340 |
printf("\n\rpanic[cpu%d]/thread=%p: ", cp->cpu_id, (void *)t); |
|
341 |
vprintf(format, alist); |
|
342 |
printf("\n"); |
|
343 |
} else |
|
344 |
goto spin; |
|
345 |
||
346 |
/* |
|
347 |
* Prior to performing sync or dump, we make sure that do_polled_io is |
|
348 |
* set, but we'll leave ipl at 10; deadman(), a CY_HIGH_LEVEL cyclic, |
|
349 |
* will re-enter panic if we are not making progress with sync or dump. |
|
350 |
*/ |
|
351 |
||
352 |
/* |
|
353 |
* Sync the filesystems. Reset t_cred if not set because much of |
|
354 |
* the filesystem code depends on CRED() being valid. |
|
355 |
*/ |
|
356 |
if (!in_sync && panic_trigger(&panic_sync)) { |
|
357 |
if (t->t_cred == NULL) |
|
358 |
t->t_cred = kcred; |
|
359 |
splx(ipltospl(CLOCK_LEVEL)); |
|
360 |
do_polled_io = 1; |
|
361 |
vfs_syncall(); |
|
362 |
} |
|
363 |
||
364 |
/* |
|
365 |
* Take the crash dump. If the dump trigger is already set, try to |
|
366 |
* enter the debugger again before rebooting the system. |
|
367 |
*/ |
|
368 |
if (panic_trigger(&panic_dump)) { |
|
369 |
panic_dump_hw(s); |
|
370 |
splx(ipltospl(CLOCK_LEVEL)); |
|
371 |
do_polled_io = 1; |
|
372 |
dumpsys(); |
|
373 |
} else if (((boothowto & RB_DEBUG) || obpdebug) && !nopanicdebug) { |
|
374 |
debug_enter("panic: entering debugger (continue to reboot)"); |
|
375 |
} else |
|
376 |
printf("dump aborted: please record the above information!\n"); |
|
377 |
||
378 |
if (halt_on_panic) |
|
136
19bbb3246a07
4745648 cluster node panics because mdboot takes too much time
achartre
parents:
0
diff
changeset
|
379 |
mdboot(A_REBOOT, AD_HALT, NULL, B_FALSE); |
0 | 380 |
else |
136
19bbb3246a07
4745648 cluster node panics because mdboot takes too much time
achartre
parents:
0
diff
changeset
|
381 |
mdboot(A_REBOOT, panic_bootfcn, panic_bootstr, B_FALSE); |
0 | 382 |
spin: |
383 |
/* |
|
384 |
* Restore ipl to at most CLOCK_LEVEL so we don't end up spinning |
|
385 |
* and unable to jump into the debugger. |
|
386 |
*/ |
|
387 |
splx(MIN(s, ipltospl(CLOCK_LEVEL))); |
|
388 |
for (;;); |
|
389 |
} |
|
390 |
||
391 |
void |
|
392 |
panic(const char *format, ...) |
|
393 |
{ |
|
394 |
va_list alist; |
|
395 |
||
396 |
va_start(alist, format); |
|
397 |
vpanic(format, alist); |
|
398 |
va_end(alist); |
|
399 |
} |