Vai al contenuto principale della pagina

Problem-solving in high performance computing : a situational awareness approach with Linux / / Igor Ljubuncic



(Visualizza in formato marc)    (Visualizza in BIBFRAME)

Autore: Ljubuncic Igor Visualizza persona
Titolo: Problem-solving in high performance computing : a situational awareness approach with Linux / / Igor Ljubuncic Visualizza cluster
Pubblicazione: Waltham, MA : , : Morgan Kaufmann, , [2015]
�2015
Edizione: 1st edition
Descrizione fisica: 1 online resource (xxii, 298 pages) : illustrations (some color)
Disciplina: 005.432
Soggetto topico: High performance computing
Note generali: Includes index.
Nota di bibliografia: Includes bibliographical references and index.
Nota di contenuto: Identification of a problemIf a tree falls in a forest, and no one hears it fall; Step-by-step identification; Always use simple tools first; Too much knowledge leads to mistakes; Problem definition; Problem that happens now or that may be; Outage size and severity versus business imperative; Known versus unknown; Problem reproduction; Can you isolate the problem?; Sporadic problems need special treatment; Plan how to control the chaos; Letting go is the hardest thing; Cause and effect; Do not get hung up on symptoms; Chicken and egg: what came first?
Do not make environment changes until you understand the nature of the problemIf you make a change, make sure you know what the expected outcome is; Conclusions; References; Chapter 2 - The investigation begins; Isolating the problem; Move from production to test; Rerun the minimal set needed to get results; Ignore biased information; avoid assumptions; Comparison to a healthy system and known references; It is not a bug, it is a feature; Compare expected results to a healthy system; Performance and behavior references are a must; Linear versus nonlinear response to changes
One variable at a timeProblems with linear complexity; Nonlinear problems; Response may be delayed or masked; Y to X rather than X to Y; Component search; Conclusions; Chapter 3 - Basic investigation; Profile the system status; Environment monitors; Machine accessibility, responsiveness, and uptime; Local and remote login and management console; The monitor that cried wolf; Read the system messages and logs; Using ps and top; System logs; Process accounting; Examine pattern of command execution; Correlate to problem manifestation; Avoid quick conclusions; Statistics to your aid; Vmstat
IostatSystem activity report (SAR); Conclusions; References; Chapter 4 - A deeper look into the system; Working with /proc; Hierarchy; Per-process variables; Kernel data; Process space; Examine kernel tunables; Sys subsystem; Memory management; Filesystem management; Network management; SunRPC; Kernel; Sysctl; Conclusions; References; Chapter 5 - Getting geeky - tracing and debugging applications; Working with strace and ltrace; Strace; Options; What you need to know before using strace; Strace from the standpoint of a system administrator; Strace has friends; Basic usage; Test case 1
Test case 2
Sommario/riassunto: Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environments, highlighting the difference between standalone system troubleshooting and complex problem-solving in large, mission critical environments, and addressing the pitfalls of information overload, micro, and macro symptoms, also including methods for managing problems in large computing ecosystems.
Titolo autorizzato: Problem-solving in high performance computing  Visualizza cluster
ISBN: 0-12-801064-9
0-12-801019-3
Formato: Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione: Inglese
Record Nr.: 9910797571203321
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui