This seed project aims to turn a corpus of qualitative text information into quantifiable contextual relationships between diverse content structured in graph and clusters. Such a quantified representation enables the use of analytic techniques over qualitative text corpuses to gain different but possibly broader insights beyond the usual operational environments monitored by underlying quantitative transactional and IoT data. Corpuses collected from communication and reporting means provide a wider context covering areas like public health, economic, politics, and social interest from around the world. The R&D work of the project attempts to explore the use of graph and cluster analytics to structure an information space of text content into graph and cluster relationships, which are quantifiable linkages between text content. Local context in individual text passages will be leveraged to build concept graph, or term-to-term relationships, from the underlying domain-specific or application-specific corpus. In addition, content similarity among text passages in a corpus will be utilized to form contextual clusters. The textual information space in graph and clusters becomes a set of structured contextual relationships to support computational analytics for unveiling insights, monitoring statuses from multiple perspectives, or performing qualitative scenario analysis. The project also attempts to develop analytical mechanisms for navigating and exploring the structured information space to assist users to get insights into their areas of interest. Context-free navigation and summarization mechanisms will be developed to gather relevant content summarizing in multiple perspectives at different levels of abstraction as insights to application or problem-specific user issues. Two experiments will be conducted to demonstrate how textual information is turned into corresponding structured information space and how the structured space is navigated to get insights for corresponding analytics requests. An experiment will use an economic development related dataset from Invest Hong Kong. Another one will use an innovation development dataset from ITF Project Database.
R&D Project Database
            Graph and Cluster Based Text Analytics Technology for Discovering Meaningful Relationship Patterns from Economic and Innovation Development Corpuses
            
            | Overview | 
More information
| Project Reference | ITP/052/21LP | 
| Hosting Institution | LSCM R&D Centre (LSCM) | 
| Project Coordinator | Dr Dorbin Ng | 
| Approved Funding Amount | HK$ 2.7 M | 
| Project Period | 1 Mar 2022 - 30 Mar 2023 | 






