Archive

Posts Tagged ‘Bulldozer’

The AMD FX (Bulldozer) Scheduling Hotfixes Tested

January 27th, 2012 No comments

The basic building block of Bulldozer is the dual-core module, pictured below. AMD wanted better performance than simple SMT (ala Hyper Threading) would allow but without resorting to full duplication of resources we get in a traditional dual core CPU. The result is a duplication of integer execution resources and L1 caches, but a sharing of the front end and FPU. AMD still refers to this module as being dual-core, although it's a departure from the more traditional definition of the word. In the early days of multi-core x86 processors, dual-core designs were simply two single core processors stuck on the same package. Today we still see simple duplication of identical cores in a single processor, but moving forward it's likely that we'll see more heterogenous multi-core systems. AMD's Bulldozer architecture may be unusual, but it challenges the conventional definition of a core in a way that we're probably going to face one way or another in the not too distant future.

BDArch 575px The AMD FX (Bulldozer) Scheduling Hotfixes Tested
A four-module, eight-core Bulldozer

The bigger issue with Bulldozer isn't one of core semantics, but rather how threads get scheduled on those cores. Ideally, threads with shared data sets would get scheduled on the same module, while threads that share no data would be scheduled on separate modules. The former allows more efficient use of a module's L2 cache, while the latter guarantees each thread has access to all of a module's resources when there's no tangible benefit to sharing.

This ideal scenario isn't how threads are scheduled on Bulldozer today. Instead of intelligent core/module scheduling based on the memory addresses touched by a thread, Windows 7 currently just schedules threads on Bulldozer in order. Starting from core 0 and going up to core 7 in an eight-core FX-8150, Windows 7 will schedule two threads on the first module, then move to the next module, etc… If the threads happen to be working on the same data, then Windows 7's scheduling approach makes sense. If the threads scheduled are working on different data sets however, Windows 7's current treatment of Bulldozer is suboptimal.

AMD and Microsoft have been working on a patch to Windows 7 that improves scheduling behavior on Bulldozer. The result are two hotfixes that should both be installed on Bulldozer systems. Read on for our take on what these hotfixes do to Bulldozer's Windows 7 performance.

Microsoft Releases Hotfix to Improve Bulldozer Performance UPDATE: Pulled

December 17th, 2011 No comments

 DSC2467 Microsoft Releases Hotfix to Improve Bulldozer Performance UPDATE: Pulled

The launch of Bulldozer in October wasn't exactly a success for AMD. In our review, Anand ended up recommending the Intel i5-2500K over AMD FX-8150. One of the reasons behind the poor performance of Bulldozer is its unique design: each Bulldozer module consists of two integer and one floating point core. Todays operating systems don't know how to optimally schedule threads for this design and as a result, the full potential of Bulldozer has not been achieved. Microsoft has released a hotfix for Windows 7 and Server 2008 R2 that should increase the performance of Bulldozer.

Let's look at the problem to see what happened and how the hotfix helps address it. Before the update, Windows didn't know how to ideally schedule threads on Bulldozer. Essentially, it didn't know when it was good to place threads on single module versus multiple modules.

win82 Microsoft Releases Hotfix to Improve Bulldozer Performance UPDATE: Pulled

The picture above explains this pretty well. Before the update, Windows more or less randomly placed the threads which meant many modules were unnecessarily active at the same time. This capped the maximum Turbo speeds because those can only be achieved when some of the modules are inactive (power gated).

VR-Zone is claiming that Windows sees one Bulldozer module as a single multi-threaded core, similar to an Intel Hyper-Threading core. Basically, your 8-core FX-8150 is seen as a quad-core, 8-thread CPU—just like Intel's i7-2600K for instance. This goes against AMD's design and marketing because Bulldozer is closer to an 8-core CPU.

We have not yet tested Bulldozer with the hotfix, but don't expect miracles as Microsoft is suggesting a 2-7% increase. Better scheduling for the Bulldozer CPUs will improve performance a bit, but not enough to close the gap in many scenarios. Windows 8 already has the new thread scheduler, and according to AMD's own and third party tests the performance increase is up to around 10%, but Bulldozer needs a lot more than 10% to surpass Sandy Bridge.

Update: VR-Zone reports (and we can confirm) that the download link for the hotfix is no longer functional. There were apparently unexpected performance drops in some cases after applying the hotfix and Microsoft is investigating the issues. Modifying the scheduler in Windows is not something to be done lightly, as it changes a core element of the OS, so more testing and validation for such updates is always a good idea.

Update 2: Apparently there is a second part to the hotfix that was not pushed live, and this hotfix was pushed live prematurely.

Bulldozer for Servers: Testing AMD’s “Interlagos” Opteron 6200 Series

November 15th, 2011 No comments

Last month, AMD launched their Bulldozer architecture on desktops, and the result was rather underwhelming; however, there are plenty of indications that Bulldozer simply wasn't architected to excel at desktop use models. AMD's "Interlagos" Opteron is now available, doubling the core count of the desktop part and placing its sights firmly on the enterprise server market.

The massive Multi Chip Module (MCM) contains eight processor cores (“modules” as AMD likes to call them) and can process 16 integer and 16 floating point threads per cycle. Each of the 16 integer threads gets their own integer cluster, complete with integer executions units, a load/store unit, and an L1-data cache. The Cluster Multi-Threading (CMT) architecture of Bulldozer should be perfectly suited for server applications that are mostly limited by memory accesses and integer processing. The 16 floating point threads have to share eight clusters of two 128-bit FP units, but those units can process FMAC and AVX instructions; recompile your HPC application with an FMAC and/or AVX capable compiler and the chip could become an HPC monster as well.

Server applications also like large caches, and Interlagos has plenty of SRAM cells. The Interlagos package has 32MB cache onboard (L2 and exclusive L3 combined). If all caching fails, it can access four memory channels of DDR3-1600, good for 51.2GB/s of theoretical bandwidth per chip. AMD also added power gating to the cores, so inactive cores can enter a very deep (C6) sleep state and save quite a bit power. This should significantly reduce power in idle and light loads.

CPUclosepic 575px Bulldozer for Servers: Testing AMDs Interlagos Opteron 6200 Series

With all of that potential, the initial clock speeds that AMD could be fit inside a 115W TDP envelope are a bit underwhelming. The fastest 115W Interlagos part right now, the Opteron 6276, has a 2.3GHz base clock. The current Opteron 6276 reaches the same clock speed at the same TDP using a less advanced 45nm SOI process. However, the longer pipeline of the new Bulldozer architecture allows the chip to use Turbo Core to boost to 2.6GHz when running most server workloads, and if only half of the cores are active, the chip is capable of 3.2GHz.

The initial desktop launch of Zembezi may have left us wanting more, and Interlagos might offer that. For server workloads at least, this all looks very promising. Let's see what the first "Bulldozer" based Opterons can do.

Bulldozer Breaks Frequency Record Again: Overclocked to 8.46GHz

October 29th, 2011 No comments

 DSC2461 575px Bulldozer Breaks Frequency Record Again: Overclocked to 8.46GHz

Just before the launch of Bulldozer, AMD demonstrated it at 8.43GHz, which was the world record back then. Now an overclocker named Andre Yang has achieved an overclock of 8.46GHz, beating AMD's record by ~30MHz. 

Gallery: Bulldozer Breaks Frequency Record Again: Overclocked to 8.46GHz Bulldozer Breaks Frequency Record Again: Overclocked to 8.46GHz Bulldozer Breaks Frequency Record Again: Overclocked to 8.46GHz

Above are the CPU-Z screenshots of the new and former record. The exact frequency is 8461.51MHz, which is 32.13MHz faster than the previous record. As shown in the pictures, both CPUs had only two cores enabled and ASUS's Crosshair V Formula motherboard was used. Andre applied a core voltage of 1.992V, whereas AMD had a voltage of 2.016V in their setup. Cooling method of Andre's setup is unknown, but most likely either liquid nitrogen or helium was used. 

The Bulldozer Review: AMD FX-8150 Tested

October 12th, 2011 No comments

AMD has been trailing Intel in the x86 performance space for years now. Ever since the introduction of the first Core 2 processors in 2006, AMD hasn't been able to recover and return to the heyday of the Athlon 64 and Athlon 64 X2. Instead the company has remained relevant by driving costs down and competing largely in the sub-0 microprocessor space. AMD's ability to hold on was largely due to its more-cores-for-less strategy. Thanks to aggressive pricing on its triple and hexa-core parts, for users who needed tons of cores, AMD has been delivering a lot of value over the past couple of years.

Recently however Intel has been able to drive its per-core performance up with Sandy Bridge, where it's becoming increasingly difficult to recommend AMD alternatives with higher core counts. The heavily threaded desktop niche is tough to sell to, particularly when you force users to take a significant hit on single threaded performance in order to achieve value there. For a while now AMD has needed a brand new architecture, something that could lead to dominance in heavily threaded workloads while addressing its deficiencies in lightly threaded consumer workloads. After much waiting, we get that new architecture today. Bulldozer is here.

 DSC2468sm The Bulldozer Review: AMD FX 8150 Tested

Read on for our full review!

Categories: New Hardware Tags: , , ,

AMD launches its Interlagos Bulldozer chips

September 7th, 2011 No comments

CHIP DESIGNER AMD has announced that it has begun shipping its Bulldozer architecture chips to original equipment manufacturers (OEMs).

The firm started making its first Bulldozer chips codenamed ‘Interlagos’ last month and is now shipping to OEM customers. Many of the first batches of the Interlagos chip, which AMD said is the world’s first 16 core x86 processor, will be used in supercomputers.

Rick Bergman, SVP and general manager of AMD Products Group said, “This is a monumental moment for the industry as this first ‘Bulldozer’ core represents the beginning of unprecedented performance scaling for x86 CPUs.”

The Interlagos chip is compatible with existing AMD Opteron 6100 Series architecture and will be available in systems in the fourth quarter of this year.

Bergman added, “The flexible new ‘Bulldozer’ architecture will give Web and datacenter customers the scalability they need to handle emerging cloud and virtualisation workloads.”

We first saw AMD’s next generation processor in action at this year’s International Supercomputing Conference in Hamburg. At the time the company gave us a launch target of the third quarter, which it has hit.

This in theory puts pressure on Intel, but AMD’s rival told The INQUIRER back in June that it would be producing Sandy Bridge Xeons by the end of the year. µ

Details on AMD Bulldozer: Opterons to Feature Configurable TDP

July 15th, 2011 No comments

AMD’s new Bulldozer-based CPUs are just around the corner. AMD has said the release of Zambezi CPUs will happen in Q3, which starts in less than 24 hours so we could expect the release as soon as in a few weeks. We know quite a lot about these CPUs already but there is at least one thing we didn't know until now and it may end up being a big thing in server market. AMD’s John Fruehe has published an interesting blog post where he reveals that AMD’s upcoming server CPUs, Operons, will feature a user-configurable TDP.  Read on for our overview on Bulldozer, thoughts on configurable TDP, and summary of AMD's future plans.