Flash based storage not so energy efficient? 
Tom's Hardware tested the power and performance of current Solid State Disks (SSDs) both in terms of performance and power. While the performance numbers were as expected, it turns out that the flash based drives' energy utilization is no better than a traditional 7200 RPM hard disk for a practical workload based on the MobileMark benchmark. The authors contend that this is because hard disks reach their maximum power draw only when seeking, whereas flash storage uses full power during any IO activity.
  |  permalink  |  related link
AMD Barcelona architecture rundown 
The Barcelona (aka "K10") microarchitecture is the latest design from AMD for both the server and desktop markets. The Phenom is the quad-core desktop variant, the Athlon X2 series includes the dual-core variant, and the 23xx and 83xx Opterons are the quad-core server varient.

The key changes over the previous line are covered in brief here and in greater detail here. Most of the interesting features require the use of an upgraded CPU socket denoted by a "+" (e.g. Socket AM2+ or Socket F+), though the CPU will work in non-plus sockets on current motherboards. Some of the "plus socket" features are:

Separated voltage planes allow the CPU to have a different voltage/frequency for each core and the northbridge.

HyperTransport 3.0, allowing greater bus bandwidth, including support for DDR2-1066.

In addition, the Barcelona introduces a shared L3 cache, which should have a major impact on HPC applications.

One major issue, however, is an L3 TLB bug present in the first generation of this architecture. This problem can be solved by disabling part of the L3 TLB system in the BIOS or via software (with a 10% performance penalty), or using a unique Linux patch to route around the problem with limited slowdown (but the patch is not intended for production use). See the Phenom wikipedia article for details.

In short, while Intel retains the upper hand in horsepower now, the AMD Barcelona design seems to sport many of the features predicted for future system design.

More information:

Wikipedia's Barcelona article covers the architecture in depth.

Anandtech benchmarking puts the chip through its paces.

To find a Barcelona-based chip, see Wikipedia:

Phenom quad-cores
Barcelona-based dual-core Athlons (scroll to "Phenom based")
Barcelona-based quad-core Opterons (23xx and 83xx)

  |  permalink
"The Coming Utility Computing Revolution" 
This recent "Innovations" article highlights "utility computing", the idea that virtualization, shared storage, and other technologies will come together to commoditize business computing.

While I agree with the general idea, the author predicts that this will marginalize IT as a field, which seems counter-intuitive. While this kind of computing does allow fewer people to manage more systems, it does make that management that much more complicated. Further, IT has always been about helping users as much as maintaining infrastructure. So I don't see the general IT realm getting eaten by other fields, but rather splintering into specialists in networking, storage, (virtual) system administration, support, etc.

Finally, I found this quote pretty funny:

Teenagers entering higher education today are already skilled at building personal application spaces on Facebook using software modules. It’s a small step to apply those principles to business applications.


"A small step" to go from facebook to a crucial business application? Seems unlikely.
  |  permalink  |  related link
Comparison of memory model between Xen and OpenVZ 
The article explains the difference in detail. Now, I can understand why it's too difficult to run Java Web server on OpenVZ for evaluating TPCW benchmarks.


  |  permalink  |  related link
Power management research at Berkeley 
There is some good information about adaptive power management at Berkeley. It appears to be a position statement/paper in construction arguing for power management in data centers.
  |  permalink
Linux Gains Two New Virtualization Solutions 
By way of slashdot, we find that the Linux kernel (2.6.23 and up) now sports three virtualization techniques out-of-box: KVM, Xen (just merged), and Lguest (also recently merged).

Lguest in particular looks interesting, as it doesn't require virtualization hardware support (like KVM), but is as simple as a single modprobe (as opposed to the Xen behemoth). Performance isn't too great right now, though (-30%).
  |  permalink  |  related link
Power Plays: How power consumption will shape the future of computing 
This Ars Technica article spotlights the development of power-aware technologies at the chip, system, network, and data center levels. It analyzes recent developments in terms of granularity, i.e. the frequency of reaction. Overall, a well written article.
  |  permalink  |  related link
Development and Optimization Techniques for Multi-Core Processors 
This article talks about the performance issues on multi-core system. It basically recommends to use the parallelism, such as OpenMP, in order to take full advantage of it. It also gives the common issues limiting the performance. I think it's a very good article to summarize idea.

http://www.devx.com/go-parallel/Article/34428
  |  permalink
Summary of Power-aware computing research in OpenMP applications 
Here are the paper list proposing research using the OpenMP library in Power-aware computing area.

1. Chun Liu, et. al, "Exploiting Barriers to Optimize Power Consumption of CMPs", IPDPS 2005.
This work is to use slack time among processors. By figuring out stall time at the end of each iteration, it reduces the frequency to save power without performance degradation. The evaluation in th paper is done only with simulator, not real experiment. SpecOMP is used to verify the idea.

2. Matthew Curtis-Maury, et. al, "Online Power-Performance Adaptation of Multithreaded Programs using Hardware Event-Based Prediction", ICS 2006.
This paper designed and implemented a framework that can adaptively regulate the concurrency level during program execution. So, the processors/threads configuration is changed based to achieve near-optimal energy efficiency. It build power/performance models and uses the hardware counters. For evaluation, 4 hyperthreaded Intel processors are used.

3. Jian Li, et. al, "Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors", HPCA 2006.
This paper proposes a heuristic method to determine # of processors and frequency level on one CMP node. All evaluation are performed on simulator. It does not expand the approach to multiple CMP nodes.

  |  permalink
Hybrid OpenMP / MPI programming  
This presentation file explains why the hybrid of MPI/OpenMP programming is required.
It comes with the examples and strategies.
Also, it talks about when the hybrid mode performs better.

http://www.nersc.gov/nusers/services/tr ... hybrid.ppt

Also, here are the paper list focusing on the performance in hybrid MPI/OpenMP applications.

1. Felix Wolf, et. al, "Automatic performance anlysis of hybrid MPI/OpenMP applications", Journal of Systems Architecture 2003.

2. Laksono Adhianto, et. al, "Performance Modeling of Communication and Computation in Hybrid MPI/OpenMP applications", ICPADS 2006.

3. Edmond Chow, et. al, "Assessing Performance of Hybrid MPI/OpenMP Programs on SMP Clusters"
  |  permalink

Next