ARM-as-a-Service (for software migration to ARM)

Boston Ltd have recently announced the availability of their ARM-as-a-Service cloud. This cloud enabled dedicated remote access to preconfigured systems with an array of tutorials and videos on how to use porting and tracing tools from Ellexus (Breeze)

 

The service is geared towards users/companies interesting in porting their software to the ARM architecture and Boston offer a wide range of professional services are this cloud offering to train people up on using the tools efficiently.

 

Contact Boston Cloud Sales to arrange a call with a cloud consultant.

Read the full release here

ARM vs Atom: Phoronix Benchmarks

The team over at Phoronix have completed a comprehensive suite of tests which compares the performance of a single Boston Viridis SoC to an Intel Atom D525. The tests covered a range of applications including: NAS Parallel Benchmarks, Video Encoding, Rendering, Molecular Dynamics and more. A further set of tests compared the scaling of the ARM cpus and a comparison of the performance improvements between ubuntu 12.04 and 12.10.

“Overall, a single 1.1~1.4GHz Calxeda ECX-1000 Cortex-A9 server node proved competitive against an Intel Atom D525, a x86_64 CPU that is clocked at 1.8GHz with two physical cores plus two logical cores via Hyper Threading. While the Calxeda node did nicely against the Atom D525 in a majority of the Ubuntu Linux benchmarks, the real story is the performance-per-Watt, which unfortunately can’t be easily compared in this case due to the limitations mentioned in the introduction. If there were the power numbers, the Calxeda ARM Server would likely easily win with the SoC power consumption under load averaging 4 Watts for the 1.1GHz card and just over 6 Watts for the newer 1.4GHz variant. The Atom D525 has a rated TDP by Intel of 13 Watts.”

Further details on the test environment and compiler flags used are on the Phoronix pages below:

Power Tests (130w for 24 Nodes)

We’ve been busy working on optimising the power draw on our product, improving airflow, tweaking low level system settings, playing with PSUs and exhaustively programming fanspeeds..  Now it’s time for some real world tests, measured “at the wall”!

System used for the tests:

  • Boston Viridis with 24 nodes (6 energy cards)
  • 24x 256GB SSD drives
  • 4GB Ram per node (96GB in total)

Test conditions:

We ran the tests over a 30 minute period, taking results after 15 minutes at 1 minutes intervals. The results were averaged across this period of time and the power measurements were recorded on a Rohde & Schwarz HAMEG HM8115-2

Results:

These are some excellent results and really show what our solution is capable of running some real world workloads. Just to put this in perspective, a standard x86 dual socket server can run anywhere up to 350w! Our 24 node configuration is roughly equivalent to a standard low power dual socket x86 system (with regards to power consumption).

Update: Further evidence of the power consumption figures above from our friends at calxeda:

Docking Throughput on Viridis

We would like to thanks our friends at the HPC service in Imperial College London for working with on the following tests and providing some great feedback:

<snip>

The test performed was one of molecular docking using the Vina code: http://vina.scripps.edu/

Docking could potentially be a pretty good fit for this type of system because it’s the sort of thing that’s often run in ensembles, so is throughput-oriented. It’s CPU intensive, a mix of integer and fp.

On the ARM system, I compiled with the system Boost, g++ 4.6.3 and compiler flags:

-O3 -mcpu=cortex-a9 -mfpu=neon -ftree-vectorize -mfloat-abi=hard -ffast-math -fpermissive

On my x86 system (dual E2620, turbo enabled, HT enabled) I used the distributed vina binary.

The test model is HIV protease and a ligand from the DUD docking test set.

Vina was run with:


vina --seed 0 --size_x 59.358 --center_x 4.486 --size_y 35.873 --center_y 0.8825 --size_z 38.609 --center_z 17.8075
--receptor receptor.pdbqt
--ligand ligand.pdbqt
--cpu 4

I elected to run it with 4 threads, which is not the most efficient for maximising throughput (there’s a serial component at the start of the test), but I wanted a threaded component in the test, and I’ll correct for that in the analysis by using CPU time, rather than elapsed wall.

Here are the timings:

ARM

1 run, @4px: 2777.86 user 12:18.62 elapsed 376%CPU

For 6 TASKS:
x86: 278 minutes of CPU time

6 runs @4px: individual ave 1192.94 5:18.70 elapsed 374%CPU

For 6 TASKS: 19.9 minutes CPU time, 5:18m of walltime

So that’s a throughput difference of ~14x between the dual E5-2620 (24t) and the 4core Viridis SoC.

Looking at power, an estimate of the energy required to do 6 repetitions:
Viridis = 7W * 12:18m * 6runs =~ 31kJ
x86 = 200W * 5:18m * 6/6 (all runs simultaneous ) =~ 64kJ

The ARM system is about twice as power efficient as the x86. It might be low power, but it takes a long time getting to the end.

What does this mean in practice? Imagine building a cluster to do nothing but run this code:

*) A Boston Viridis cluster built to the same power budget as an x86 one would
– have (200W/7W) ~ 28x the number of nodes
– a throughput (28 / 14x ) = ~2x that of the x86.

*) A calxeda cluster built to match throughput with the x86 one will
– need 14x the number of nodes
– require ~.5x the power

*) calxeda built to the same volume as an x86 one will have
– 36x # nodes (72/u / 2/u)
– 6x # cores (6 core counting HT)
– ~1.3x higher power draw ( 72*7W / 200W)
– ~2.7x the throughput of the x86. (36x/ 14x)

To conclude:
The Boston Viridis has a ~2x energy advantage on throughput compute-intensive workload, but this is substantially lower than the *power* advantage (~28x) would suggest because of markedly reduced performance (~1/14x) relative to the x86.

P.S: (Sell these things for Raspberry Pi prices and I’ll buy a container-load).

x86 running on ARM!

Today marked an important milestone in our product testing and development for our Viridis platform here at Boston. We can now officially confirm that we have run x86 binaries our on ARM based Viridis platform!

Over the last few weeks, we have been working with a group of engineers, from Eltechs, who are developing software to run x86 programs on ARM-based servers. This software could help lower one of the largest barriers to ARM SoC adoption as alternatives to Intel x86 processors in the datacentre.

Eltechs has developed a binary translator that acts as an emulator. The software currently delivers on average around 45% of native ARM performance. During our tests on the Viridis platform we observed up to 65% of native performance (6 tests were run covering a range of tests – details cannot be published at this time). We will be working with Eltechs on our Viridis platform, who believes it could reach 80% native ARM performance or greater in the future.

Of all the ARM products tested by Eltechs, we were delighted to hear our platform was received well:

The Boston server has been the fastest platform we have tested to date, Vadim Gimpelson, CEO of Eltech

We will continue to work with Eltechs in testing and validating our platform and hope to see further improvements as the software matures. In addition to our successful initial tests, we will be adding this software to the Boston ARM Wrestle program so if anyone has a particular code or application that hasn’t been ported to ARM, please get in touch with us at hpc@boston.co.uk to discuss benchmarking on our test cluster.

ApacheBench on Viridis

Our friends over at calxeda have recently posted some interesting Apache benchmarks on the energy core cards:

[John Mao of Calxeda] It’s the middle of June, which means we’re smack in the middle of tradeshow and conference season for the IT industry. We were at Computex in Taipei two weeks ago, and this week we’re participating in International Supercomputing in Hamburg, and GigaOM’s Structure conference in San Francisco. In fact, our CEO, Barry Evans, is on a panel to discuss fabric technologies and their role in the evolution of datacenters. Should be a good one!
In spite of the hectic season, it hasn’t stopped us from moving forward with what everyone is really waiting for: benchmarks! Well, I’m happy to be able to share some preliminary results of both performance and power consumption for those of you looking for more efficient web servers. Continue reading

Fedora and Ubuntu running at 5w

Following the successful initial tests of the system First SOCs powered on!, we now move on to provisioning the systems. Two of the first operating systems that are being validated for use with the Viridis system are Redhat Fedora17 and Ubuntu 12.10

First up Fedora17. We obtained the image from here and wrote to disk using xzcat Fedora-17-armhfp-higbank-sda.img.xz > /dev/sdX. All ok so far, now to boot…

[root@fedora-arm ~]# uname -a
Linux fedora-arm 3.4.2-3.fc17.armv7hl.highbank #1 SMP Tue Jun 12 19:27:16 UTC 2012 armv7l armv7l armv7l GNU/Linux
[root@fedora-arm ~]# dmesg | grep -i 'armv7 proc'
[ 0.000000] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d

It’s not just a minimal system either, there is a full repository of all the normal packages found on the x86_64 arch Continue reading

First SOCs powered on!

Today we were like children on christmas morning, excitedly unwrapping our first sample of the Calxeda EnergyCard and powering it on.

Fortunately for us, they don’t require batteries  that weren’t included and we were able to fire up all 4 EnergyCore SOC’s first time . (although they could quite easily run on batteries!)

As all good techies know, manuals are well worth a read 🙂 Here’s the quick start guide (in its current entirety): “Each SOC’s ipmi will dhcp”. (I do enjoy manuals that cut straight to the point!)

With no time to waste we connected the system up to the network, ran tail -f on our cobbler servers dhcp logs and connected up the power. We were poised for a long afternoon of debugging and trouble shooting when we witnessed a rare thing of beauty with new systems: they were booting as expected! Each SOC got its ipmi address and ‘ipmitool sol activate’ revealed lots of nice boot messages to confirm that everything was in order. Next step: OS provisioning…