A single conversion with GPT-5 takes around 140 seconds and consumes roughly ten times the compute of a straight language-model translation, the research team reported — a pragmatic trade-off, they say, for cutting months of manual rule rewrites into batch operations.
ARuleCon's three-stage conversion pipeline
The system the National University of Singapore and Fudan University researchers built, called ARuleCon, breaks SIEM rule translation into three steps. First it reads a source detection rule and removes platform-specific code to produce a plain-language template describing filters, time windows, thresholds and grouping conditions. That neutral template is handed to a large language model (LLM), which drafts an equivalent rule in the destination platform's query language.
Two automated checking agents then refine that draft. One agent queries official vendor documentation to verify operator names and field names for the target platform; the other translates both the original and converted rules into Python and runs them over synthetic logs generated by the system to confirm they produce the same outputs. Any mismatch triggers an automated repair loop.
Benchmarking with GPT-5, DeepSeek‑V3 and LLaMA‑3
The team tested ARuleCon across nearly 1,500 rule conversions spanning five platforms: Splunk, Microsoft Sentinel, IBM QRadar, Google Chronicle and RSA NetWitness. In head-to-head benchmarking the ARuleCon workflow outperformed each underlying large language model used alone — GPT-5, DeepSeek‑V3 and LLaMA‑3 — by roughly 15% on average across structural, semantic and logical consistency measures. The researchers emphasize that the gains held regardless of which model was underneath, attributing the improvement to the system design rather than a single LLM.
Platform outcomes: Splunk and Google Chronicle excel; QRadar and NetWitness lag
Most conversions ran without errors on the target platform, with success rates above 90% in most cases. Conversions were near‑perfect for Google Chronicle and Splunk. The researchers found IBM QRadar and RSA NetWitness harder to convert to, citing less comprehensive documentation and more complex grammar in those platforms as contributing factors.
Known failure modes and the limits of synthetic testing
The team candidly cataloged where ARuleCon can fail. The Python-based consistency check runs over logs that the system itself generates rather than the noisy, evolving data streams seen in live security operations. "Our confidence is strongest for rules whose semantics can be well-covered by generated test cases, and weaker for rules involving rare behaviors, custom schemas or complex temporal correlations," Ming Xu told ISMG.
The neutral template approach also has boundaries. It performs well for standard detection logic but breaks down when platforms differ in how they execute rules. Rules depending on stateful processing, vendor-specific data enrichment, or implicit behaviors that are not written into the rule itself can be problematic. ARuleCon treats vendor documentation as ground truth when refining conversions and has limited ability to detect when that documentation is wrong or incomplete, though the researchers consider such cases rare; the design does permit documentation updates.
What this means for detection engineers, named SIEM vendors, and NCS Group / Singtel Singapore
- Detection engineers: ARuleCon is positioned as an augmenting tool rather than a replacement. The researchers recommend staged validation — testing converted rules against historical logs and known attack traces, and running converted rules in monitoring-only mode before activation. The validation workflow is currently offline and is flagged as future work.
- Named SIEM vendors (Splunk, Microsoft Sentinel, IBM QRadar, Google Chronicle, RSA NetWitness): Vendors with comprehensive documentation and simpler grammar (Google Chronicle, Splunk) saw higher successful conversion rates; those with sparser docs or more complex grammars (IBM QRadar, RSA NetWitness) proved harder targets. The system's dependence on vendor documentation means documentation quality materially affects conversion outcomes.
- NCS Group and Singtel Singapore: The research team has released source code on GitHub and reports that their industry partner NCS Group's Singtel Singapore is commercializing a prototype. That path to commercialization underscores the design choice to use ARuleCon for batch work such as platform migration, rule onboarding and periodic maintenance rather than real-time alerting.
The researchers also make a practical cost argument: ARuleCon is not fast enough for live alerting — a single GPT-5 conversion taking roughly 140 seconds and significantly more compute — but it is intended for the slow, error-prone work of migration where hours or months of human labor currently stand in the balance. "Spending tens of seconds or even longer on a high-quality conversion can be acceptable, especially when compared with the manual effort required from detection engineers," Xu said. And that, the team adds, is why ARuleCon should augment analysts rather than replace them.
For teams preparing to migrate SIEM platforms or absorb another organization’s detection rules, the immediate takeaway is pragmatic: automated conversion can cut conversion errors and accelerate batch onboarding — but only when followed by offline, staged validation against real historical logs and known attack traces. The research leaves open one concrete next step the authors themselves flag as future work: moving the validation workflow from offline testing toward operational integration while protecting against the system's remaining blind spots.




