The applicant shall develop:
an infrastructure for automated collection of big data and
implement a framework that allows efficient (semi-supervised and partially automated) processing, segmentation and annotation of this data for deep learning methods.
The primary use case for this call is transcription and annotation of recorded VHF radio communication from various geographical regions where distinct English accents of non-native English speakers can be expected (for example Roman, Slavic and Anglo-Saxon countries). However, the framework is expected to be configurable and support also the annotation of other kinds of data like biophysiological data, 3D spatial data or weather data that can be used for the use case of pilot state classification, object detection or aircraft trajectory optimization and fuel burn reduction.
The structure and main outcomes of this call shall be as follows:
WP1: Live and active network of contributors (data providers, transcribers, annontators)
The annotation of large amount of voice and/or other data represents tremendous volume of effort. The trend in other businesses is to join forces, involve a number of contributors and share the volume of the effort and jointly benefit from the access to results as a community. The expected content of this work package shall be:
o Operational concept for the community (key persones, roles, activities, access to data, access to services, support of other community based projects – simulation platforms, Air Traffic Control (ATC) live, language training, phraseology etc.).
Key deliverable – “Community Launch strategy” document.
o Activation of community of volunteers/contractors willing to help with data pre-processing.
Key deliverables: Public relations (PR) strategy description, Active group of supporters/contributors – need to define minimum size/other parameters.
o “Gamification concept”– the members of the community can develop their “credit” in community based on the volume and quality of their contribution and/or other parameters.
The processes for quality assurance has to be defined, inspiration by other community projects (SW, HW, digital content) is highly recommended.
Newcomers – minor contribution, must be reviewed by members with higher status before acceptance. (promotion is connected to volume x quality metrics)
Most senior (solid contributors) – can influence the future of community
Key deliverable – “Contributor incentive concept proposal” implementable into the future data collection platform
WP2: Framework for data collection and annotation (storage, delivery process, preprocessing, etc.).
The collaborative space for the community of contributors. This work package shall define and develop online platform for efficient collaboration. The platform shall use available online environment (for example utilize existing cloud services), provide scalable storage space, define scalable workflows, user management, authentications and authorisation schemes, shall be compliant with existing legislative regarding privacy and data management. The work package shall
o Jointly with Topic Manager define the processes, data flows and control means for efficient facilitation of data collection and annotation activities.
o Design, development, implementation, validation and deployment of online collaborative environment facilitating the workflow
o Key deliverables – System description, running instance of “community portal”, implemented data collection framework (data storage, cloud services, scripts, etc.)
WP3: Toolset for innovative data annotation and processing
This work package shall develop the toolset for more effective and efficient data preprocessing and semi-supervised annotation. To accelerate the processing of raw data, some tools can pre-process them and thus reduce the volume which will remain for manual processing. Due to the variable character of modalities considered (speech, image, EEG, etc.), the main effort should be focused on developing generalized, yet efficient unsupervised and semi supervised methods on the back-end level. The methods can be designed using probabilistic, graph-based or deep network algorithms.
However, the computational complexity and result interpretability should be taken into account when deciding on a particular algorithm. From the front-end point of view, it is expected to design and implement domain-independent methods, for example based on morphology, for dimensionality reduction and data structuring compatible with the proposed algorithms within the back-end part. These methods can have a supportive role to existing domain-dependent features or act as the only predictor.
The work package shall deliver:
o Concept, design and implementation of the methods for semi-automated data annotation and validation
o Key deliverable – working set of tools tested by Topic Manager on agreed set of validation scenarios
WP4: Application for the data annotation and processing
Building on the toolset in WP3, develop an application able to run in the collaborative space (defined in WP2) and continuously process the collected data with incrementally increasing performance. The application is expected to follow current user experience and usability standards. The work package shall deliver:
o Implementation, validation and deployment of the concept into the existing community framework
o Key deliverable – application integrated into the proposed framework, able to provide raw and semi-supervised annotation of data with confidence intervals.
WP5: Legal framework
The topic needs to take into account evolving legislation addressing privacy, data management, data security and other aspects related to storage, processing and provisiong of large volumes of information in the cyberspace. The work package shall provide:
Study of current legislation and regulations, constraints and key differences in EU member states (or out of EU).
Study and guidelines for personal data protection (GDPR).
Ethical issues considerations
Guidelines for development of virtual collaborative environment
Intellectual property policies (what belongs to whom, who will have access to the results, how results can be used, etc.)
Key deliverable – study and report of legislative framework
Dateline for submission
: 6 February 2019
17:00:00 Brussels time
Source: The European Commission
Illustration Photo: Air traffic control at Heathrow airport, UK (credits: NATS - UK air traffic control / Flickr Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0))