Vai al contenuto principale della pagina
Autore: | Ljubuncic Igor |
Titolo: | Problem-solving in high performance computing : a situational awareness approach with Linux / / Igor Ljubuncic |
Pubblicazione: | Waltham, MA : , : Morgan Kaufmann, , [2015] |
�2015 | |
Edizione: | 1st edition |
Descrizione fisica: | 1 online resource (xxii, 298 pages) : illustrations (some color) |
Disciplina: | 005.432 |
Soggetto topico: | High performance computing |
Note generali: | Includes index. |
Nota di bibliografia: | Includes bibliographical references and index. |
Nota di contenuto: | Identification of a problemIf a tree falls in a forest, and no one hears it fall; Step-by-step identification; Always use simple tools first; Too much knowledge leads to mistakes; Problem definition; Problem that happens now or that may be; Outage size and severity versus business imperative; Known versus unknown; Problem reproduction; Can you isolate the problem?; Sporadic problems need special treatment; Plan how to control the chaos; Letting go is the hardest thing; Cause and effect; Do not get hung up on symptoms; Chicken and egg: what came first? |
Do not make environment changes until you understand the nature of the problemIf you make a change, make sure you know what the expected outcome is; Conclusions; References; Chapter 2 - The investigation begins; Isolating the problem; Move from production to test; Rerun the minimal set needed to get results; Ignore biased information; avoid assumptions; Comparison to a healthy system and known references; It is not a bug, it is a feature; Compare expected results to a healthy system; Performance and behavior references are a must; Linear versus nonlinear response to changes | |
One variable at a timeProblems with linear complexity; Nonlinear problems; Response may be delayed or masked; Y to X rather than X to Y; Component search; Conclusions; Chapter 3 - Basic investigation; Profile the system status; Environment monitors; Machine accessibility, responsiveness, and uptime; Local and remote login and management console; The monitor that cried wolf; Read the system messages and logs; Using ps and top; System logs; Process accounting; Examine pattern of command execution; Correlate to problem manifestation; Avoid quick conclusions; Statistics to your aid; Vmstat | |
IostatSystem activity report (SAR); Conclusions; References; Chapter 4 - A deeper look into the system; Working with /proc; Hierarchy; Per-process variables; Kernel data; Process space; Examine kernel tunables; Sys subsystem; Memory management; Filesystem management; Network management; SunRPC; Kernel; Sysctl; Conclusions; References; Chapter 5 - Getting geeky - tracing and debugging applications; Working with strace and ltrace; Strace; Options; What you need to know before using strace; Strace from the standpoint of a system administrator; Strace has friends; Basic usage; Test case 1 | |
Test case 2 | |
Sommario/riassunto: | Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environments, highlighting the difference between standalone system troubleshooting and complex problem-solving in large, mission critical environments, and addressing the pitfalls of information overload, micro, and macro symptoms, also including methods for managing problems in large computing ecosystems. |
Titolo autorizzato: | Problem-solving in high performance computing |
ISBN: | 0-12-801064-9 |
0-12-801019-3 | |
Formato: | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione: | Inglese |
Record Nr.: | 9910797571203321 |
Lo trovi qui: | Univ. Federico II |
Opac: | Controlla la disponibilità qui |