Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Software Engineering of NLP-Based Computer-Assisted ..., Schemes and Mind Maps of Software Engineering

The development of production-quality natural language processing (NLP)-based computer-assisted coding (CAC) applications requires a process-driven approach to ...

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 02/28/2023

thecoral
thecoral 🇺🇸

4.4

(28)

133 documents

1 / 5

Toggle sidebar

Related documents


Partial preview of the text

Download Software Engineering of NLP-Based Computer-Assisted ... and more Schemes and Mind Maps Software Engineering in PDF only on Docsity! Software Engineering of NLP-based Computer-assisted Coding Applications 1 Software Engineering of NLP-based Computer-assisted Coding Applications by Mark Morsch, MS; Carol Stoyla, BS, CLA; Ronald Sheffer, Jr., MA; and Brian Potter, PhD Abstract The development of production-quality natural language processing (NLP)-based computer-assisted coding (CAC) applications requires a process-driven approach to software development and quality assurance. This should include a well-defined software engineering process, with the specific phases and milestones typically consisting of requirements analysis, preliminary design, detailed design, implementation, unit testing, system testing, and deployment. However, to be successful in a demanding business climate with complex technology like NLP, an organization needs to move beyond a typical “waterfall” approach. The Capability Maturity Model (CMM) defines the key features of a formal software engineering process and provides a ranking system to measure an organization’s overall effectiveness in delivering quality software that meets customers’ needs. This paper describes the aspects of software development and approaches to testing that yield consistent and high-quality results for NLP- based CAC applications. Introduction Applications that utilize NLP present a challenge to the typical methodology in that there is no way to fully specify input requirements for unstructured human language. This can lead to errors in situations when language used in live data goes beyond that used during development and testing. Another common source of error is unintended side effects of changes. Because of the size and complexity of NLP programs, a modification to one part of the system may improve certain situations but break others. For NLP applications to be reliable, the software engineering process must be adapted to minimize these types of errors. For system design, we describe functionality that can produce robust behavior for new or unfamiliar language. An overview of the software testing process is given for a large-scale environment. Automation of test execution and analysis becomes very important in this type of environment. Thorough regression analysis and a rapid-cycle, iterative approach are two key features of the software testing process. Background The Capability Maturity Model (CMM) was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University to help software development organizations measure the maturity of their software development processes and to help software acquirers in evaluating software vendors.1 More recently, starting in 2001, a new model called the Capability Maturity Model Integration (CMMI) has been developed which builds on the original CMM by integrating the various disciplines—such as software development, systems engineering, integrated product development, and software acquisition— 2 Perspectives in Health Information Management, CAC Proceedings; Fall 2006 into a unified model.2 Like the original CMM, CMMI defines five levels of maturity: (1) initial, (2) managed, (3) defined, (4) quantitatively managed, and (5) optimized. At the initial level, processes are ad hoc, and project success often depends upon the skills of the individual contributors. The higher levels of maturity correspond with more defined and repeatable processes that cover every aspect of the development process. Level 2 includes all of the core activities in the software development process to include requirements management, project planning, project monitoring and control, quality assurance, and configuration management. At level 2 these activities are focused on the project level. For level 3, an organization has defined and repeatable processes that function across multiple projects. A level 3 organization is also capable of improving project performance through product verification and validation, organizational training, and integrated product management. The activities described in this paper primarily fall in level 2 or level 3, which represent high-quality software development processes. NLP software, like other types of artificial intelligence (AI) software, has often not been developed using a software engineering process. The reasons for this are related to the type of software that is developed and the type of people doing the development. Unlike most software projects, the input requirements for NLP software cannot be fully specified because of the huge scope and variability in human language. Also the programming techniques and algorithms used in NLP software—such as lexical analyzers, parsers, neural networks, Bayesian classifiers, vector processors, and machine learning algorithms—are very complex and require specialized knowledge to create and maintain, even when they are thoroughly documented. Thus, the development of those key portions of an NLP program is often carried out in an evolutionary or experimental manner. Finally, the individuals that develop NLP software are frequently from research institutions or even from domains outside computer science, such as linguistics or cognitive science. These developers often will not have training in the software engineering process. To develop production-quality NLP software that performs consistently and reliably, there are benefits in applying more structured methodologies. However, critics of the CMM, such as James Bach,3 do raise valid issues that are of particular importance for AI technology. Three criticisms of CMM that we will mention here are (1) the emphasis of process over the individual, (2) the lack of emphasis on innovation, and (3) the emphasis on activities over results. The algorithms and techniques used in NLP require specialized skills and are typically the product of a series of innovations that are not easy to plan. The methods described in this paper attempt to structure the software development process so that performance is consistent and continuously improving over time by fostering innovative thinking while implementing a robust testing model for measuring results. Development Process Developing and maintaining NLP software for CAC applications requires a combination of skills in computer science, linguistics, and medical coding. In our company, personnel with expertise in each of the three individual areas work together to resolve bugs and implement enhancements for the CAC system.4 Typically, an individual is knowledgeable in one of the three skill areas, so the process is important to facilitate a common understanding of the requirements. In our process, we emphasize the following five practice areas: 1. Requirements Management – Utilizing a defect tracking tool, domain experts file bug reports and requests for enhancements. Each item will include example medical documents to be used for development and testing, and specification of the desired output. Items are assigned severity to assist in ranking, and a priority list for development is created at the start of each update cycle. 2. Rapid Development Cycle – A rapid cycle is motivated by the relatively short timelines to analyze, specify, and implement the regulatory coding changes. For a system in maintenance, this will consist of a series of iterations with weekly build and unit test cycles. For new
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved