LISP: Revolutionizing API Testing with LLM-Powered Input Space Partitioning
A technical deep dive into the ICSE ’25 research breakthrough transforming how developers test library APIs
What is LISP?
LISP (LLM based Input Space Partitioning) represents a paradigm shift in API testing methodology. This innovative approach leverages Large Language Models (LLMs) to analyze library API source code and intelligently partition input spaces based on code semantics and domain knowledge.
Core Capabilities
-
Semantic Code Analysis: LLMs directly parse API implementation code -
Intelligent Input Partitioning: Automatically identifies critical input boundaries -
Knowledge Integration: Combines programming expertise with common sense reasoning -
Research Validation: Peer-reviewed at premier software engineering conference ICSE 2025
Officially certified by ACM research artifacts evaluation
Setup Guide (Step-by-Step Installation)
System Requirements
Component | Specification |
---|---|
OS | Ubuntu 20.04/22.04 (x86 architecture) |
Memory | Minimum 4 GB RAM |
Java | JDK 11+ |
Python | 3.10+ |
Build Tool | Apache Maven 3.6+ |
Installation Process
-
Install Java Dependencies
Navigate tollm-JQF
directory and execute:mvn install -DskipTests
-
Configure API Access
Withinllm-JQF
directory run:sh set-key.sh -k <Your_API_Key> -b https://api.openai.com/v1
💡 Cost Note: Full execution requires ~$40 in OpenAI token usage
-
Install Python Components
Navigate tollm-seed-generator
directory:pip3 install -r requirements.txt
-
Prepare Test APIs
2205 pre-configured API signatures located at:LISP/llm-JQF/signs/ ├── commons-lang3 ├── guava └── ...
Practical Implementation Guide
Core Execution Tool
Primary test runner: ./llm-JQF/bin/jqf-llm
Key command parameters:
-i # Enable coverage instrumentation
-o # Output coverage results
-l # Signature file (batch testing)
-s # Experiment mode:
# cg → LISP-CG configuration
# skipUnder → Ablation study type 1
# skipEP → Ablation study type 2
# basic → LLM baseline
Single API Test Example
Testing Guava’s Longs.min method:
bin/jqf-llm -i -o "com.google.guava:guava:32.1-jre" \
"com.google.common.primitives.Longs.min(long[])"
Successful Output:
Semantic Fuzzing with LLM
--------------------------
Test signature: org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])
Elapsed time: 3s
Number of executions: 4
Valid inputs: 4 (100.00%)
Unique failures: 0
API Coverage: 11 branches (100.00% of 11)
Total Coverage: 16 branches (100.00% of 16)
Batch Testing Procedure
Note: All APIs in file must belong to same library
bin/jqf-llm -i -o -l signs/guava -s cg \
"com.google.guava:guava:32.1.2-jre"
Results Analysis Framework
Output Directory Structure
result/
├── commons-lang3_cg_1737620727667.json # Summary report
└── details/
├── commons-lang3/
│ ├── cg/
│ │ ├── ArrayUtils.addAll(...)0/
│ │ │ ├── coverage_hash # Coverage path hash
│ │ │ ├── detail.json # Granular metrics
│ │ │ ├── graph.json # Method call graph
│ │ │ ├── input_generator # LLM-generated inputs
│ │ │ ├── llm_output.log # LLM interaction log
│ │ │ └── RunCode.java # Test harness code
Key Data Files Explained
1. Summary Report (JSON Format)
{
"coverage":1.0, // Branch coverage ratio
"coveredEdge":11, // Covered branches
"generatedInputsNum":6, // Total generated inputs
"inputToken":9449, // Input token consumption
"outputToken":969, // Output token consumption
"runTime":27950, // Execution time (ms)
"successAPINum":1, // Successfully tested APIs
"totalEdge":11, // Total branches
"unexpectedBehaviorNum":0 // Unexpected behaviors
}
2. Method-Level Details (detail.json)
Contains:
-
Branch-by-branch coverage -
Execution path traces -
Exception stack traces -
Token usage breakdown
Research Evaluation Framework
RQ1: Code Coverage Effectiveness
-
Metric: Branch coverage percentage -
Data Sources: -
Summary report’s coverage
field -
Branch details in detail.json
-
-
Analysis Focus: Coverage comparison against traditional methods
RQ2: Defect Detection Capability
-
Metric: Unexpected behavior count -
Data Sources: -
Summary report’s unexpectedBehaviorNum
-
Exception traces in detail.json
-
-
Analysis Focus: Boundary condition error detection
RQ3: Operational Efficiency
-
Key Metrics: -
Token consumption (inputToken/outputToken) -
Execution duration (runTime)
-
-
Optimization Focus: Reducing LLM interaction costs
RQ4: Ablation Study Configurations
Mode Parameter | Experimental Configuration |
---|---|
-s cg | Full LISP-CG implementation |
-s skipUnder | Ablation study type 1 (ISP+OI) |
-s skipEP | Ablation study type 2 (TDA+OI) |
-s basic | LLM baseline approach |
Technical FAQ
Why 4GB memory minimum?
LLM inference and code analysis require substantial memory resources. Testing confirmed out-of-memory errors occur below this threshold.
Where are generated test inputs stored?
Examine details/<library>/<mode>/<method>/input_generator
files for raw LLM-generated inputs.
How to estimate token costs?
Use approximation formula:
Total Cost ≈ (inputToken/1000)×$0.01 + (outputToken/1000)×$0.03
Can private methods be tested?
Currently supports public APIs only. Private methods require indirect testing via public interfaces.
What’s the purpose of coverage_hash?
Enables rapid comparison of coverage paths across runs, eliminating redundant computation.
Conclusion: The Future of Intelligent Testing
LISP signifies a fundamental transformation in API verification:
-
Human-AI Collaboration: Merges LLM code comprehension with testing expertise -
Dynamic Partitioning: Semantic-based input space division -
Knowledge Integration: Incorporating domain knowledge into test generation -
Quantifiable Validation: Systematic evaluation through four research dimensions