<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The RAM per CPU wall</title>
	<atom:link href="http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/feed/" rel="self" type="application/rss+xml" />
	<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/</link>
	<description>My personal thoughts on technology and its obfuscation</description>
	<lastBuildDate>Wed, 09 May 2012 18:47:44 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Martin</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-27</link>
		<dc:creator><![CDATA[Martin]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 21:04:22 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-27</guid>
		<description><![CDATA[Hello Bas,

[don&#039;t know how to quote things here, so I will just use the plain old method]

I think so, too: as soon as you give more resource X to spend, there will be a need (generated) for using X up completely.
You are also right, CPU efficiency (power) is getting better and better. But then as soon as we pack more cores in box, we are still getting a higher and higher power and compute density. To actually make use of all this (assumed we can, which we cannot currently), we have to have to-be-written software that will make use of the incredible 500k IOP/s PCIe SSDs.
My understanding is, the more we will increase the core count, the more we will see we simply cannot make use of it anyways -- and that we need a fundamental change in our approach.
 
There was a famous paper recently (about the Barrelfish OS) that basically stated: with a 16+ core system, shared memory will not scale any more. Modelling the hardware like being distinct CPU entities that communicate over a shared bus, needs less CPU cycles for intra-core communication than using the shared memory at all. This is quite and important and also shocking result, as it shows our scale-up-by-increasing-core-counts approach is doomed. 
Basically, we can either choose to waste more and more cycles when having more and more cores (this is somewhat like the inversion of Moore&#039;s law!) with newer generations of CPUs, or we start solving the problem by scaling our systems down and distribute load to more effective low-power entities.

Regarding the &quot;wimpy nodes approach&quot;: correct, not with all of these systems that are currently being tested you will be able to run say Windows 7. But there is some hope like ARM said to be building an ATOM equivalent with HW virtualization instruction set - paired with at least double performance and some &lt;5W power draw for a 2 core system. There are rumours of Dell shipping these systems to some of their large-scale hosting customers (guess you read this at El Reg, too) who run some standard LAMP VMs on them.
This might be a viable approach for the future. These systems are basically an intelligent harddrive from a space and power-consumption perspective, powerful enough for task, that&#039;s it. This is quite a neat approach and quite a fundamental one.

To sum it up: I think we have software and software-design crisis, less a hardware crisis. We should seriously start thinking about scaling-down our per-node performance and scale-up our networking gear likewise.]]></description>
		<content:encoded><![CDATA[<p>Hello Bas,</p>
<p>[don't know how to quote things here, so I will just use the plain old method]</p>
<p>I think so, too: as soon as you give more resource X to spend, there will be a need (generated) for using X up completely.<br />
You are also right, CPU efficiency (power) is getting better and better. But then as soon as we pack more cores in box, we are still getting a higher and higher power and compute density. To actually make use of all this (assumed we can, which we cannot currently), we have to have to-be-written software that will make use of the incredible 500k IOP/s PCIe SSDs.<br />
My understanding is, the more we will increase the core count, the more we will see we simply cannot make use of it anyways &#8212; and that we need a fundamental change in our approach.</p>
<p>There was a famous paper recently (about the Barrelfish OS) that basically stated: with a 16+ core system, shared memory will not scale any more. Modelling the hardware like being distinct CPU entities that communicate over a shared bus, needs less CPU cycles for intra-core communication than using the shared memory at all. This is quite and important and also shocking result, as it shows our scale-up-by-increasing-core-counts approach is doomed.<br />
Basically, we can either choose to waste more and more cycles when having more and more cores (this is somewhat like the inversion of Moore&#8217;s law!) with newer generations of CPUs, or we start solving the problem by scaling our systems down and distribute load to more effective low-power entities.</p>
<p>Regarding the &#8220;wimpy nodes approach&#8221;: correct, not with all of these systems that are currently being tested you will be able to run say Windows 7. But there is some hope like ARM said to be building an ATOM equivalent with HW virtualization instruction set &#8211; paired with at least double performance and some &lt;5W power draw for a 2 core system. There are rumours of Dell shipping these systems to some of their large-scale hosting customers (guess you read this at El Reg, too) who run some standard LAMP VMs on them.<br />
This might be a viable approach for the future. These systems are basically an intelligent harddrive from a space and power-consumption perspective, powerful enough for task, that&#039;s it. This is quite a neat approach and quite a fundamental one.</p>
<p>To sum it up: I think we have software and software-design crisis, less a hardware crisis. We should seriously start thinking about scaling-down our per-node performance and scale-up our networking gear likewise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Online Storage Optimization &#187; Blog Archive &#187; Happy New Year</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-26</link>
		<dc:creator><![CDATA[Online Storage Optimization &#187; Blog Archive &#187; Happy New Year]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 19:57:23 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-26</guid>
		<description><![CDATA[[...] Bas Raayman sees CPU power hitting the wall: The RAM per CPU wall [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Bas Raayman sees CPU power hitting the wall: The RAM per CPU wall [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bas Raayman</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-25</link>
		<dc:creator><![CDATA[Bas Raayman]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 16:39:20 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-25</guid>
		<description><![CDATA[Martin,

let me start by thanking you for the reply and a different perspective on all of this.

I agree that power draw and density are major problems (and let&#039;s not forget cooling). On the other hand we have seen a big dip in power consumption on a per core basis. Newer cpu&#039;s have features such as Cool&#039;n&#039;Quiet, but with consolidation and virtualization initiatives became inherently useless since we are overprovisioning until our CPU&#039;s are near the 80% use limit (sometimes even over).

Basic problem: As soon as we provide more power we use it. Technologies like virtual machines or hardware/software partitioning (zones and such) only make the problem worse.

And yes, wimpy nodes are known to me, and they can work great. If you check out the example of the US army buying PS3&#039;s for their cell architecture and implementing special software for a custom purpose, I can only state this is a great idea and a great implementation, but it&#039;s not generic enough for most implementations. Custom development would drive costs up in a major way.

And I again agree when you say that there are fundamental changes we need to make, but this is a general performance/utilization problem where the technology provided just does not grow as fast. 

Also looking forward to your response,
Bas]]></description>
		<content:encoded><![CDATA[<p>Martin,</p>
<p>let me start by thanking you for the reply and a different perspective on all of this.</p>
<p>I agree that power draw and density are major problems (and let&#8217;s not forget cooling). On the other hand we have seen a big dip in power consumption on a per core basis. Newer cpu&#8217;s have features such as Cool&#8217;n'Quiet, but with consolidation and virtualization initiatives became inherently useless since we are overprovisioning until our CPU&#8217;s are near the 80% use limit (sometimes even over).</p>
<p>Basic problem: As soon as we provide more power we use it. Technologies like virtual machines or hardware/software partitioning (zones and such) only make the problem worse.</p>
<p>And yes, wimpy nodes are known to me, and they can work great. If you check out the example of the US army buying PS3&#8242;s for their cell architecture and implementing special software for a custom purpose, I can only state this is a great idea and a great implementation, but it&#8217;s not generic enough for most implementations. Custom development would drive costs up in a major way.</p>
<p>And I again agree when you say that there are fundamental changes we need to make, but this is a general performance/utilization problem where the technology provided just does not grow as fast. </p>
<p>Also looking forward to your response,<br />
Bas</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: uberVU - social comments</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-24</link>
		<dc:creator><![CDATA[uberVU - social comments]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 16:32:13 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-24</guid>
		<description><![CDATA[&lt;strong&gt;Social comments and analytics for this post...&lt;/strong&gt;

This post was mentioned on Twitter by BasRaayman: [Blog] The RAM per CPU wall: http://is.gd/5Fux7...]]></description>
		<content:encoded><![CDATA[<p><strong>Social comments and analytics for this post&#8230;</strong></p>
<p>This post was mentioned on Twitter by BasRaayman: [Blog] The RAM per CPU wall: <a href="http://is.gd/5Fux7" rel="nofollow">http://is.gd/5Fux7</a>&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martin</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-23</link>
		<dc:creator><![CDATA[Martin]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 13:52:33 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-23</guid>
		<description><![CDATA[Hi,


actually, I don&#039;t think servers will get larger and larger. Power draw and density is already yet a problem in many datacenters. With increasing core count, optimizations at the gate level will not scale as fast as core-# scales. 
More importantly, newer systems have a lower &quot;getting-things-done per Watt&quot;-ratio than specialized, &quot;wimpy nodes&quot; (e.g. project FAWN). With +~8 times more output with the same power draw, there is a cost argument for running such lower-powered more-effective systems (again, datacenter as a problem, etc.).
James Hamilton has released several highly recommend blog posts with having all this in mind: http://perspectives.mvdirona.com/2009/11/30/2010TheYearOfMicroSliceServers.aspx

If all this is true, this has a lot of interesting implications. Our software landscape has to change fundamentally as the networking layer has to change. Software needs to be more distributed, concurrent and parallelized, software has to handle failures as being the norm, not being a special case, etc. 
[I will stop here].


Looking forward for your replies,
Martin]]></description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>actually, I don&#8217;t think servers will get larger and larger. Power draw and density is already yet a problem in many datacenters. With increasing core count, optimizations at the gate level will not scale as fast as core-# scales.<br />
More importantly, newer systems have a lower &#8220;getting-things-done per Watt&#8221;-ratio than specialized, &#8220;wimpy nodes&#8221; (e.g. project FAWN). With +~8 times more output with the same power draw, there is a cost argument for running such lower-powered more-effective systems (again, datacenter as a problem, etc.).<br />
James Hamilton has released several highly recommend blog posts with having all this in mind: <a href="http://perspectives.mvdirona.com/2009/11/30/2010TheYearOfMicroSliceServers.aspx" rel="nofollow">http://perspectives.mvdirona.com/2009/11/30/2010TheYearOfMicroSliceServers.aspx</a></p>
<p>If all this is true, this has a lot of interesting implications. Our software landscape has to change fundamentally as the networking layer has to change. Software needs to be more distributed, concurrent and parallelized, software has to handle failures as being the norm, not being a special case, etc.<br />
[I will stop here].</p>
<p>Looking forward for your replies,<br />
Martin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tweets that mention The RAM per CPU wall « BasRaayman's technical diatribe -- Topsy.com</title>
		<link>http://basraayman.com/2009/12/29/the-ram-per-cpu-wall/#comment-22</link>
		<dc:creator><![CDATA[Tweets that mention The RAM per CPU wall « BasRaayman's technical diatribe -- Topsy.com]]></dc:creator>
		<pubDate>Tue, 29 Dec 2009 13:16:35 +0000</pubDate>
		<guid isPermaLink="false">http://basraayman.com/?p=115#comment-22</guid>
		<description><![CDATA[[...] This post was mentioned on Twitter by Steve O&#039;Donnell, Bas Raayman.com. Bas Raayman.com said: [Blog] The RAM per CPU wall: http://is.gd/5Fux7 [...]]]></description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by Steve O&#39;Donnell, Bas Raayman.com. Bas Raayman.com said: [Blog] The RAM per CPU wall: <a href="http://is.gd/5Fux7" rel="nofollow">http://is.gd/5Fux7</a> [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

