San Jose (CA) - Tilera today announced the next generation of their tile-based
processors. A follow-on to their previous Tile64 embedded CPUs, two new TilePro
models offer 36 and 64 cores with notably greater performance per watt. A
toolset revision called Multicore Development Environment (MDE) 2.0
allows full emulation and simulation with clock cycle granularity.

Tile64 to TilePro36 and TilePro64
Tilera uses a common multi-core approach to design both for their older Tile64 and the new TilePro lines. A single core is created, perfected, validated and tested. Once it's working it is replicated as many times as are needed for the silicon die.
The original Tile64 offered only a 64-core version. This new release introduces a 36-core version called TilePro36 (in addition to the 64-core version). TilePro36 uses a scaled down implementation of their 64-core product designed to increase yields and provide a lower-power mid-range product. Tilera is continuing to expand its design and products with more than 64 cores are planned, TG Daily was told.
Same process technology
Tile64 and TilePro are both manufactured using a 90 nm process generation. Tilera claims an increase in performance of 1.5x to 2.5x in TilePro. Fed primarily by a doubling of the L1 cache size per tile and doubling of the L2 associativity, the addition of a new communication channel and benefits given by added instructions through recompilation, all for an increase of 5% in overall power consumption. Unit pricing will increase from $435 per chip in 10K unit quantities for Tile64, to around $900 per chip for TilePro64 in 200 unit quantities. Development boards and MDE 2.0 software cost $18,000.
New instructions
Tilera introduced several new instructions with TilePro, including some for multimedia, unaligned loads, memory and fence hints as well as offset load/store instructions. The company claims the new multimedia instructions double the throughput and performance of audio codecs, as well as echo cancellation processing. The new offset load/store instructions increase video encoding by 50% and the unaligned loads are now 60% faster.
Fully compatible
TilePro is both binary and socket compatible with Tile64. Existing customers can literally pop out their old chips, pop in the new ones and be up and running without any changes to software. Customers will see an immediate increase in performance due to the larger cache, according to teh company. However, there are features added to the new cores which require a recompilation (such as the new instructions and additional communications lane).
Tilera is still in startup company mode, funded by venture capitalists. Its first commercial product was announced in August, 2007, though they have said samples were shipped as early as June, 2007. That announcement took the company out of "stealth mode" even though volume products were not available until April, 2008. Tilera now claims to have more than 45 customers, many of which were taken directly from high-speed technology fields, such as those typically employing custom ASICs and FPGAs.

Tile64 to TilePro36 and TilePro64
Tilera uses a common multi-core approach to design both for their older Tile64 and the new TilePro lines. A single core is created, perfected, validated and tested. Once it's working it is replicated as many times as are needed for the silicon die.
The original Tile64 offered only a 64-core version. This new release introduces a 36-core version called TilePro36 (in addition to the 64-core version). TilePro36 uses a scaled down implementation of their 64-core product designed to increase yields and provide a lower-power mid-range product. Tilera is continuing to expand its design and products with more than 64 cores are planned, TG Daily was told.
Same process technology
Tile64 and TilePro are both manufactured using a 90 nm process generation. Tilera claims an increase in performance of 1.5x to 2.5x in TilePro. Fed primarily by a doubling of the L1 cache size per tile and doubling of the L2 associativity, the addition of a new communication channel and benefits given by added instructions through recompilation, all for an increase of 5% in overall power consumption. Unit pricing will increase from $435 per chip in 10K unit quantities for Tile64, to around $900 per chip for TilePro64 in 200 unit quantities. Development boards and MDE 2.0 software cost $18,000.
New instructions
Tilera introduced several new instructions with TilePro, including some for multimedia, unaligned loads, memory and fence hints as well as offset load/store instructions. The company claims the new multimedia instructions double the throughput and performance of audio codecs, as well as echo cancellation processing. The new offset load/store instructions increase video encoding by 50% and the unaligned loads are now 60% faster.
Fully compatible
TilePro is both binary and socket compatible with Tile64. Existing customers can literally pop out their old chips, pop in the new ones and be up and running without any changes to software. Customers will see an immediate increase in performance due to the larger cache, according to teh company. However, there are features added to the new cores which require a recompilation (such as the new instructions and additional communications lane).
| Tilera Comparison Chart | |||||||
| Description | Tilera Tile64 | Tilera TilePro36 | Tilera TilePro64 | ||||
| Available? | Yes | Yes | Yes | ||||
| Introduced | Jul 17, 2007 | Sep 22, 2008 | Sep 22, 2008 | ||||
| Cores | 64 | 36 | 64 | ||||
| Core Clock | 500,700,866 MHz | 500 MHz | 700,866 MHz | ||||
| DDR2 Clock | 667,800 MHz | 533 MHz | 800 MHz | ||||
| DDR2 controllers | 4 | 3 | 4 | ||||
| DDR2 efficiency | 55% | 70%+ | 70%+ | ||||
| PCI-e controllers | 2 | 1 | 2 | ||||
| 10 GbE + XAUI | 2 | 1 | 2 | ||||
| Misc I/O | 10 Gbps | 10 Gbps | 10 Gbps | ||||
| Flexible I/O | 20 Gbps | 20 Gbps | 20 Gbps | ||||
| Max realtime I/O | 50 Gbps | 30 Gbps | 50 Gbps | ||||
| Max intra-die I/O | 31 Tbps | 20.9 Tbps | 37.2 Tbps | ||||
| Mesh traffic | 32 bits/clock full duplex | 32 bits/clock full duplex | 32 bits/clock full duplex | ||||
| "Direct-to-tile" I/O? | No | Yes | Yes | ||||
| max Watts | 22 | 16 | 23 | ||||
| L1 Cache/core | 8KB Instruction 8KB Data | 16KB Instruction 8KB Data | 16KB Instruction 8KB Data | ||||
| L2 Cache/core | 64KB | 64KB | 64KB | ||||
| Cache line | 64 bytes | 64 bytes | 64 bytes | ||||
| Possible L3 Cache | 4MB | 2.3MB | 4MB | ||||
| Dedicated coherency network? | No | Yes | Yes | ||||
| 16-bit flops | 221 Gflops | 144 Gflops | 221 Gflops | ||||
| 32-bit flops | 166 Gflops | 54 Gflops | 166 Gflops | ||||
| 16-bit Flops/watt | 10.05 | 7.13 | 9.61 | ||||
| 32-bit Flops/watt | 7.55 | 3.38 | 7.22 | ||||
| 16-bit Flops/core | 3.45 | 3.17 | 3.45 | ||||
| 32-bit Flops/core | 2.59 | 1.5 | 2.59 | ||||
| Dual endian support? | No | Yes | Yes | ||||
| Memory striping? | No | Yes | Yes | ||||
| Cache distributable to other tiles? | Yes | Yes | Yes | ||||
| ISA | 64-bit VLIW bundle | 64-bit VLIW bundle | 64-bit VLIW bundle | ||||
| Socket | 1517 BGA | 1517 BGA | 1517 BGA | ||||
| Package | 40mm x 40mm | 40mm x 40mm | 40mm x 40mm | ||||
Tilera is still in startup company mode, funded by venture capitalists. Its first commercial product was announced in August, 2007, though they have said samples were shipped as early as June, 2007. That announcement took the company out of "stealth mode" even though volume products were not available until April, 2008. Tilera now claims to have more than 45 customers, many of which were taken directly from high-speed technology fields, such as those typically employing custom ASICs and FPGAs.




