Hitachi Develops Open Source Software Based Big Data Analytics Technology to Increase Speed by Up to 100 Times

Hitachi announced the development of the technology increasing the speed of big data analytics on an open source software Hadoop-based distributed data processing platform by a maximum of 100 times that of a conventional system. This technology converts data processing procedure generated for software processing in conventional Hadoop data processing, to that optimized for parallel processing on hardware, to enable high-speed processing of various types of data in FPGA.
a year ago

For a high-speed analytics system with lower IT investment

Tokyo, November 14, 2017

Hitachi today announced the development of the technology increasing the speed of big data analytics on an open source software Hadoop-based distributed data processing platform (1) ("Hadoop platform") by a maximum of 100 times that of a conventional system. This technology converts data processing procedure generated for software processing in conventional Hadoop data processing, to that optimized for parallel processing on hardware, to enable high-speed processing of various types of data in FPGA (2). As a result, less number of servers will be needed when conducting high-speed big data analytics, thus minimizing IT investment while enabling interactive analytics by data scientists, quick on-site business decision making, and other timely information services. This technology will be applied to areas such as finance and communication, and through verification tests, will be used to support a platform for data analytics service.

In recent years, big data analytics for interactively analyzing large amounts of various types of data from sources such as sensor information in IoT, financial account transaction records and social media, under various conditions and from various perspectives for business and services, is becoming increasingly important. The open source Hadoop platform is widely used for such analytics, however as many servers are required to raise processing speed, issues existed in terms of equipment and management costs.

In 2016, Hitachi developed high performance data processing technology using FPGA (3). As this technology however was developed for Hitachi's proprietary database, it could not easily be applied to the Hadoop platform as it employed a different data management method and used customized database management software.

Picture: Overview of the technology developed (credit: Hitachi)

To address this issue, Hitachi developed technology to realize high-speed data processing on the Hadoop platform utilizing FPGA (4). Features of the technology developed are outlined below.

1. Data processing procedure conversion technology to optimize FPGA processing efficiency

The Hadoop platform data processing engine optimizes data processing using the CPU to serially execute software to retrieve, filter and compute. Simply executing this procedure however does not fully exploit the potential of the hardware to achieve high-speed processing through parallel processing. To overcome this, the Hadoop processing procedures were analyzed, and taking into consideration distributed processing efficiency, technology was developed to convert the order of the processing commands to that optimized for parallel processing on FPGA. This will enable the FPGA circuit to be efficiently used without loss.

2. Logic circuit design to analyze various data formats and enable high-speed processing in FPGA

Conventionally in FPGA processing, to facilitate processing on the hardware, the formats of different types of data, such as date, numerical value and character string, was restricted, and dedicated processing circuits were required for each type of data. The Hadoop platform however needs to deal with multiple data formats even for the same item, for example, even with dates there is the UNIX epoch day expression as well as the Julian day expression among others. Thus, as many dedicated processing circuits would be needed, the limited FPGA circuitry could not be effectively used with conventional FPGA processing. To resolve this issue, a logic circuit was designed to optimize parallel processing in FPGA, using parser circuits that clarify various data types and sizes (5) and depending on the data type and size, packs multiple data to be processed in one of the circuits. As a result, it is possible to not only handle various data formats but also realize parallel processing fully utilizing filtering and aggregation circuits for efficient high-speed data processing.

The technology developed was applied to the Hadoop platform. When analytics was performed on sample data, it was found that data processing performance improved by up to 100 times. The results suggest it will be possible to reduce the cost of Hadoop-based big data analytics as the number of servers required for high-speed processing can be significantly reduced. Hitachi will now conduct verification tests together with customers as it works towards the commercialization of this technology.

The technology developed will be on exhibit at SC17 - The International Conference for High Performance Computing, Networking, Storage and Analysis, to be held from 13th to 16th November 2017 in Denver, Colorado, USA.

(1) Hadoop-based distributed data processing platform: A computation platform for storing and analyzing large amount of data on distributed servers using open source software, "Hadoop"
(2) FPGA (Field Programmable Gate Array): An integrated circuit manufactured to be programmable by the purchaser. In general, FPGA is inexpensive compared to application specific circuits.
(3) 3rd August 2016 News Release: "Hitachi develops high performance data processing technology increasing data analytics speed by up to 100 times"
(4) 10 related international patents pending
(5) Supports the standard format "Parquet," generally used in open source data processing platforms such as Hadoop  
Source: Hitachi  

Comments

No comments to display.

Related posts

EU Call for Proposals: Multi-use of the marine space, offshore and near-shore: pilot demonstrators

Activities shall develop pilots by involving industrial actors and by integrating the available knowledge, technologies and facilities, in particular capitalising on the results of EU and national projects for the development of multi-use platforms or co-location of different activities in a marine space, and relevant support offshore vessels and autonomous vehicles.
Application Deadline in 3 days
4 hours ago

EU Call for Proposals: The Future of Seas and Oceans Flagship Initiative

Proposals shall address one of the following sub-topics: blue cloud services, or ocean observations and forecasting, or technologies for observations (in 2020). Actions shall demonstrate integration, capacity and (scientific, economic etc) potential.
Application Deadline in 3 days
4 hours ago

EU's Call for Proposals: Modelling international trade in agri-food products

Trade modelling has a long-standing tradition but some issues are notoriously difficult to assess and include in the existing simulation models.
Application Deadline in 3 days

EU Call for Proposals: Circular bio-based business models for rural communities

Based on an established agro-food system, proposals shall consider a variety of additional bio-based processes and end products that could be integrated into the system, and that are viable on a small scale (farm to rural community level).
Application Deadline in 3 days
5 hours ago

The Infant Formula Testing Market is expected grow at a CAGR of 6.9% to reach revenue of $20.26billion by 2023.

Infant formula is one of the emerging markets in food and beverage industry. It is basically used for the infants whose age is below 12 months (preferably).

UNICEF Innovation Fund's Call for Applications: Data science, machine learning, artificial intelligence or similar technology solutions

The UNICEF Innovation Fund is looking to make up to 100K equity-free investments to provide early stage (seed) finance to for-profit technology start-ups that have the potential to benefit humanity. If you’ve got a start-up registered in one of UNICEF’s programme countries and have a working, open source prototype (or you are willing to make it open-source) showing promising results, the UNICEF Innovation Fund is looking for you.
Application Deadline in a month

Etisalat Digital accelerates adoption of Artificial Intelligence and Blockchain in the United Arab Emirates

Etisalat Digital selects four disruptive companies to join Future Now scaleup programme after Dubai Future Accelerators collaboration.

European Venture Investment Reaches All-Time High

Europe's venture capital investment eclipsed €20 billion for the first time ever, according to PitchBook's 4Q 2018 European Venture Report. Despite the 25.9% drop in deal volume year-over-year (YoY), swelling deal sizes and increased interest from nontraditional investors helped drive deal value to the high-water mark.

The African Union Ten Years Aquaculture Action Plan for Africa 2016 - 2025

The prospects for Africa’s aquaculture are enormous. They are defned by the continent’s vast aquatic resources, land mass and climate that are suitable for the production of an array fsh species and essential inputs such as ingredients for feed. Of added advantage, is the fact that most of the world’s important tropical and sub-tropical aquaculture species are native to Africa.

Singapore-based SEEDS Capital appoints seven partners to co-invest in agrifood tech startups

SEEDS Capital, the investment arm of Enterprise Singapore, has appointed seven co-investment partners under Startup SG Equity. This will catalyse more than S$90 million worth of investments to develop Singapore-based startups in the Agrifood tech sector. This was announced by Senior Minister of State for Trade and Industry, Dr Koh Poh Koon, at the opening of Indoor Ag-Con Asia.