03281oam 2200433zu 450 991013912210332120241212215902.097814244602431424460247(CKB)2560000000009641(SSID)ssj0000452603(PQKBManifestationID)12194515(PQKBTitleCode)TC0000452603(PQKBWorkID)10468521(PQKB)11748079(NjHacI)992560000000009641(EXLCZ)99256000000000964120160829d2010 uy engur|||||||||||txtccr2010 IEEE International Symposium on Performance Analysis of Systems and Software[Place of publication not identified]IEEE20101 online resource (ix, 248 pages) illustrationsBibliographic Level Mode of Issuance: Monograph9781424460236 1424460239 As processor performance continues to outgrow memory capacity and bandwidth, system and application performance has become constrained by the memory subsystem. Promising new technologies like Phase Change Memory (PCM) and Flash have emerged which may add capacity at a cost cheaper than conventional DRAM, but at the cost of added latency and poor endurance. It is likely that systems leveraging these new memory technologies in the memory subsystem would require an innovative memory system architecture to gain the benefit of added capacity while mitigating the costs of latency and potential device wear-out. One such proposed architecture is a hierarchical memory sub-system with a faster but costly memory (e.g., DRAM) acting as a cache for a slower but cheaper memory e.g., solid state memory like NAND flash, NOR flash or PCM. The memory subsystem is now a hybrid of two different memory technologies, exploiting the cost effectiveness and non-volatility of solid state memory devices with the speed of traditional DRAM. In order to study the performance tradeoffs with such hierarchical architectures one needs to first study the effect of having a last level cache, which is much larger than the caches in existing systems. Existing tools and methodologies for cache evaluation fall short. We develop a multi-processor system prototype that runs applications with a coherently-attached FPGA which can emulate different memory architectures for long periods of time. The output of the system is not a memory trace, but the performance results of the emulated memory system design which may be used at any time to evaluate the design tradeoffs. The large cache will filter out references going to the solid state memory. Thus the miss ratio of the large cache is an important metric. The sensitivity of the miss ratio to configuration parameters like cache size and line size needs to be evaluated to identify the right set of parameters.Computer systemsEvaluationCongressesComputer systemsEvaluation004.24IEEE StaffPQKBPROCEEDING99101391221033212010 IEEE International Symposium on Performance Analysis of Systems and Software2531614UNINA