For a high-speed analytics system with lower IT investment

Tokyo, November 14, 2017

Hitachi today announced the development of the technology increasing the speed of big data analytics on an open source software Hadoop-based distributed data processing platform (1) ("Hadoop platform") by a maximum of 100 times that of a conventional system. This technology converts data processing procedure generated for software processing in conventional Hadoop data processing, to that optimized for parallel processing on hardware, to enable high-speed processing of various types of data in FPGA (2). As a result, less number of servers will be needed when conducting high-speed big data analytics, thus minimizing IT investment while enabling interactive analytics by data scientists, quick on-site business decision making, and other timely information services. This technology will be applied to areas such as finance and communication, and through verification tests, will be used to support a platform for data analytics service.

In recent years, big data analytics for interactively analyzing large amounts of various types of data from sources such as sensor information in IoT, financial account transaction records and social media, under various conditions and from various perspectives for business and services, is becoming increasingly important. The open source Hadoop platform is widely used for such analytics, however as many servers are required to raise processing speed, issues existed in terms of equipment and management costs.

In 2016, Hitachi developed high performance data processing technology using FPGA (3). As this technology however was developed for Hitachi's proprietary database, it could not easily be applied to the Hadoop platform as it employed a different data management method and used customized database management software.

Picture: Overview of the technology developed (credit: Hitachi)

To address this issue, Hitachi developed technology to realize high-speed data processing on the Hadoop platform utilizing FPGA (4). Features of the technology developed are outlined below.

1. Data processing procedure conversion technology to optimize FPGA processing efficiency

The Hadoop platform data processing engine optimizes data processing using the CPU to serially execute software to retrieve, filter and compute. Simply executing this procedure however does not fully exploit the potential of the hardware to achieve high-speed processing through parallel processing. To overcome this, the Hadoop processing procedures were analyzed, and taking into consideration distributed processing efficiency, technology was developed to convert the order of the processing commands to that optimized for parallel processing on FPGA. This will enable the FPGA circuit to be efficiently used without loss.

2. Logic circuit design to analyze various data formats and enable high-speed processing in FPGA

Conventionally in FPGA processing, to facilitate processing on the hardware, the formats of different types of data, such as date, numerical value and character string, was restricted, and dedicated processing circuits were required for each type of data. The Hadoop platform however needs to deal with multiple data formats even for the same item, for example, even with dates there is the UNIX epoch day expression as well as the Julian day expression among others. Thus, as many dedicated processing circuits would be needed, the limited FPGA circuitry could not be effectively used with conventional FPGA processing. To resolve this issue, a logic circuit was designed to optimize parallel processing in FPGA, using parser circuits that clarify various data types and sizes (5) and depending on the data type and size, packs multiple data to be processed in one of the circuits. As a result, it is possible to not only handle various data formats but also realize parallel processing fully utilizing filtering and aggregation circuits for efficient high-speed data processing.

The technology developed was applied to the Hadoop platform. When analytics was performed on sample data, it was found that data processing performance improved by up to 100 times. The results suggest it will be possible to reduce the cost of Hadoop-based big data analytics as the number of servers required for high-speed processing can be significantly reduced. Hitachi will now conduct verification tests together with customers as it works towards the commercialization of this technology.

The technology developed will be on exhibit at SC17 - The International Conference for High Performance Computing, Networking, Storage and Analysis, to be held from 13th to 16th November 2017 in Denver, Colorado, USA.

(1) Hadoop-based distributed data processing platform: A computation platform for storing and analyzing large amount of data on distributed servers using open source software, "Hadoop"
(2) FPGA (Field Programmable Gate Array): An integrated circuit manufactured to be programmable by the purchaser. In general, FPGA is inexpensive compared to application specific circuits.
(3) 3rd August 2016 News Release: "Hitachi develops high performance data processing technology increasing data analytics speed by up to 100 times"
(4) 10 related international patents pending
(5) Supports the standard format "Parquet," generally used in open source data processing platforms such as Hadoop  
Source: Hitachi  

Comments

No comments to display.

Related posts

Singapore to establish Additive Manufacturing Facility and Applications in Maritime Sector

The facility’s location also leverages PSA’s parts supplier base and facility operations to support just-in-time inventory. This move towards digitised inventories reduces the need to hold excess inventory, which lowers storage costs, while shortening turnaround time from weeks to days due to improved availability of spare parts. In the long run, PSA will expand the scope of these services to the wider maritime industry, including ship owners, to help build its business adjacencies.

EU's Call for Proposals: The AQUAEXCEL2020 twelfth call for access

The facilities available cover the entire range of production systems (cage, pond, recirculation, flowthrough, hatchery and disease challenge); environments (freshwater, marine, cold, temperate and warm water); scales (small, medium and industrial scale); fish species (salmonids, cold and warm water marine fish, freshwater fish and artemia); and fields of expertise (nutrition, physiology, health & welfare, genetics, engineering, monitoring & management technologies).
Application Deadline in a month
23 minutes ago

Environment and Big Data: Role in Smart Cities of India

This study identifies six environmental factors, which should be integrated in the development of smart cities. These environmental factors include indicators of landscape and geography, climate, atmospheric pollution, water resources, energy resources, and urban green space as a major component of the environment.
32 minutes ago

Corteva Agriscience and IRRI Ink Partnership to Develop Advanced Rice Technologies and Programs

The partnership seeks to improve the genetic outcomes of breeding programs, encourage sustainable rice cultivation, and develop new rice varieties which deliver higher yields and are more resilient against biotic and abiotic stresses.
37 minutes ago

Call for Applications: Communication projects which mitigate anthropogenic climate change

The Minor Foundation for Major Challenges (MFMC) is inviting applications from all over the world to fund communication projects which mitigate anthropogenic climate change.
Application Deadline in a month
41 minutes ago

EU's Call for Proposals: Digital technologies for improved performance in cognitive production plants

Proposals need to develop new technologies to realise cognitive production plants, with improved efficiency and sustainability, by use of smart and networked sensor technologies, intelligent handling and online evaluation of various forms of data streams as well as new methods for self-organizing processes and process chains.
Application Deadline in 4 months

Study reveals best use of wildflowers to benefit crops on farms

For the first time, a Cornell University study of strawberry crops on New York farms tested this theory and found that wildflower strips on farms added pollinators when the farm lay within a "Goldilocks zone," where 25 to 55 percent of the surrounding area contained natural lands.

EU's Call for Proposals: Reinforcing the EU agricultural knowledge base

Activities shall analyse and compare the approaches taken on their performance and impact for farmers/foresters as well as effectivity of the communication and information channels used for dissemination in countries and regions.
Application Deadline in 3 months

The Bali Fintech Agenda: A Blueprint for Successfully Harnessing Fintech’s Opportunities

In response to the Bali Fintech Agenda, the World Bank will focus on using fintech to deepen financial markets, enhance responsible access to financial services, and improve cross-border payments and remittance transfer systems.

EU's Call for Proposals: Sustainable solutions for bio-based plastics on land and sea

Activities shall focus on sustainability strategies and solutions for bio-based productsand support the Plastics Strategy. They shall include innovative product design and business models facilitating efficient reuse and recycling strategies and solutions, including ensuring the safety of recycled materials when used for toys or packaging food stuffs.
Application Deadline in 3 months