Adaptive Sequential Prefetching & Cache Simulator Project

Link

https://mshort2.github.io/15418project/

Summary

Our project is to build a shared address space multiprocessor cache simulator using the MESI and MOESI protocols, and then compare the performance of adding on a cache coherency extension known as adaptive sequential prefetching to see how performance is affected.

Background

We covered snooping based cache coherence protocols in class like MSI, MESI, MOESI, and MESIF.

We read up about adaptive sequential prefetching, which is an extension that can be added onto a cache coherency protocol intended to decrease the cache miss rate. From the paper “Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors” by Fredrik Dahlgren and Michael Dubois, “Adaptive sequential prefetching cuts the number of read misses by fetching a number of consecutive blocks into the cache in anticipation of future misses. The number of prefetched blocks is adapted according to a dynamic measure of prefetching effectiveness which reflects the amount of spatial locality at different times … Moreover, as opposed to simply adopting a larger block size, it does not affect the false sharing miss rate.”

Challenge

The challenge of making a parallel cache is making sure that we maintain cache coherence across transactions. Implementing the coherence protocols will be quite nontrivial since there is a good amount of complexity for the systems in terms of messages being sent and state movements, so we need to ensure that the protocols are done correctly in our simulator. We will need to come up with a communication system between “processors” and make sure they send the correct messages to each other.

Additionally, we need to figure out how to add on adaptive sequential prefetching to the plain MESI and MOESI while still ensuring their correctness. Implementing adaptive sequential prefetching includes adding extra bits for each cache line and additional counters for each cache.

We also need to come up with a way to test out our simulator and metrics by which we measure performance. Our plan is to use the framework from Professor Railing's Computer Architecture Design Simulator for Students. We will need to be able to implement the protocols in accordance with this framework, which will require learning about the simulator framework and how to use it. We also need to come up with programs to test out our cache simulators on, and we plan to find traces of example memory accesses similar to what we did in 15213’s Cache Lab. We should also pick programs that can show when adaptive sequential prefetching is most useful and least useful so that we can best analyze the performance.

Resources

We are using the CADSS for the cache simulator and testing as suggested by Professor Mowry. We are referring to the Dahlgren and Dubois paper mentioned above (link) for adding adaptive sequential prefetching. We have not decided on what we will use for traces, but we are thinking about potentially using a trace generation software to come up with potential memory traces.

Goals

PLAN TO ACHIEVE:

Design and implement a multiprocessor cache simulator that supports both MESI and MOESI protocols.
Develop a robust communication system for processor transactions to ensure proper cache coherence.
Integrate adaptive sequential prefetching by adding necessary extra bits and counters, while preserving protocol correctness.
Validate the simulator using a well-defined set of memory traces and testing programs based on the CADSS framework.

HOPE TO ACHIEVE:

Achieve improved simulation accuracy and performance benchmarking through the use of adaptive sequential prefetching.
Demonstrate scenarios where adaptive prefetching significantly improves cache performance and where its benefits are limited.

Platform Choice

We will use the Gates machines. Our system does not require very complicated hardware since it is just a simulator of a cache, but we will require a multiprocessor machine like the ones in Gates since we need to simulate a multiprocessor cache.

Schedule

April 12: Get basic cache coherence protocols working
April 15: Develop traces to validate our current coherence setup
April 20: Implement adaptive sequential prefetching
April 24: Generate exhaustive benchmarking data for ASP
April 26: Create final report + poster and draw conclusions about when ASP is effective

Milestone Report

Updated Schedule

April 12: Get basic cache coherence protocols working
April 15: Develop traces to validate our current coherence setup (Partially Completed)
April 20: Develop large-scale traces for validation using an external tool (New bullet point)
April 22: Implement adaptive sequential prefetching (Originally April 20)
April 24: Generate exhaustive benchmarking data for ASP
April 26: Create final report + poster and draw conclusions about when ASP is effective

Work Completed:

In the five days we have worked on the project since it was approved, we have successfully implemented MSI, MESI, and MOESI protocols on top of the already implemented MI protocol. We cloned Professor Railing’s CADSS repository and read through the relevant modules/starter code for our project before modifying the files in the coherence folders. This involved adding three pairs of cache/snoop functions to facilitate transitions between the different states based on the bus traffic. We have begun validating our implementations by testing on a small set of traces but are still looking into large-scale trace generators. The two options we are researching at the moment are Intel Pin and ScalaMemTrace.

Revised Schedule:

We are still on track to produce all deliverables for the project. We are also optimistic that we will be able to meet the “Hope to Achieve” goals.

Poster Sessions

We intend to present graphs for data on miss rate, hit rate, bus traffic, etc. on varying traces. These traces will be designed to allow for as much variance as possible so we can present findings based on different configurations for a machine.

Preliminary Results

We do not yet have any data to report other than successful validation of small traces.

Concerning Issues

Right now we are most concerned about trace generation for validation. We have spent more than half our working time so far on researching the tools mentioned earlier and are still struggling to get results with them. This is not that big of a problem right now because we can use existing traces and generate medium-sized ones on our own, but we would like to figure out large-scale generation sooner rather than later. For the actual implementation, it seems like it’s a matter of just doing the work. We didn’t find it too hard to implement the cache coherence protocols and we think we have a good grasp on adaptive prefetching.

Final Report

This section will present the final results of the project, including performance comparisons, conclusions on the effectiveness of adaptive sequential prefetching, and potential future work. (Content coming soon.)