

Craig is a former software developer and red teamer. He has been pentesting at Black Hills Infosec since 2018.

Artificial Intelligence (AI) has been a hot topic in information technology and information security since before I entered the industry. Developments in AI are something that I had been aware of, but I hadn’t chosen to really dive into the subject in terms of leveraging AI as part of my job as a penetration tester. I gave a webcast on penetration testing methodology a while back, and someone asked me afterward how I use AI in my methodology/workflow. At the time, my answer was “I don’t.”
For a long time, I considered AI to be interesting but not particularly useful. However, progress has been made, technology has improved, and it has become clear that AI has matured to the point where we absolutely can use it to help us with our jobs as penetration testers. So, what does that look like? This blog post will be the first in a series of posts where I will describe my initial experiences trying to integrate AI into my penetration testing methodology.
When exploring new technology and incorporating it into your methodology, it’s always a good idea to start by examining what other folks in your space are already doing with that technology. When I initially started going down this path, my BHIS colleague Derek Banks introduced me to a project called burpference. Burpference is a Burp Suite plugin that takes requests and responses to and from in-scope web applications and sends them off to an LLM for inference. In the context of artificial intelligence, inference is taking a trained model, providing it with new information, and asking it to analyze this new information based on its training.
Installing the burpference extension in Burp Suite is a straightforward task. The extension utilized the Jython standalone JAR. Once I downloaded the JAR, I configured the Burp Suite Python environment to point to the JAR. This setting can be found by opening “Extensions settings” in the Extensions tab.

Once the Python environment was configured, I downloaded and unzip the latest burpference release. Burpference generates log files in the extension directory, so I needed to ensure that Burp Suite had write permissions to that location. Next, I opened the “Installed” page of the Extensions tab, clicked the “Add” button, and selected the burpference.py file from the extension directory.

I checked the Output section of the Burp Suite extension loader to ensure no errors occurred. Once the extension was loaded, I opened the new burpference tab and selected a configuration file that pointed to my LLM. For my initial experimentation with burpference, I set up a small (7 billion parameter) deepseek-r1 model in Ollama on an older gaming PC in my lab.


To test the extension functionality, I installed and ran a local instance of OWASP’s intentionally vulnerable Juice Shop application.

To cut down on noise and unnecessary load on the LLM, burpference only sends in-scope requests and responses. So, I added the Juice Shop application to the project scope in Burp Suite. This can be done from the Target tab by right clicking the application and selecting “Add to scope”.

I encountered two pitfalls that I had to troubleshoot when configuring the extension:
- I was running the model on a physically separate host in my lab. By default, Ollama binds to localhost, and I was initially unable to communicate with the model from my testing host where I was running burpference. I was able to fix this by setting the OLLAMA_HOST environment variable to 0.0.0.0 on the host running Ollama.
- Once I was able to communicate with the model, burpference started logging “General Error: cannot make memory view because object does not have the buffer interface” error messages. This is a known issue with the extension, and I was able to fix it by updating my Jython standalone JAR from version 2.7.3 to version 2.7.4.
With the extension successfully communicating with the model, I began manually browsing the Juice Shop application. As I browsed the application, I noticed that burpference was sending each request and response to the LLM with the following prompt:
You are a web application penetration tester conducting a comprehensive operation on an application in the offensive stage of the engagement and focused on leveraging security flaws.
Your objective is to examine the HTTP requests and responses that are available through the burp suite proxy history from the web application as we test the application.
This analysis will focus on:
- Request and Response Evaluation: Scrutinizing HTTP requests and responses for security misconfigurations, sensitive data exposure, and other vulnerabilities.
- Authentication and Session Management: Assessing the effectiveness of authentication mechanisms and session handling practices.
- Input Validation and Output Encoding: Identifying weaknesses related to input validation that may lead to injection attacks or cross-site scripting (XSS).
Use reasoning and context to find potential flaws in the application by providing example payloads and PoCs that could lead to a successful exploit.
If you deem any vulnerabilities, include the severity of the finding as prepend (case-sensitive) in your response with any of the levels:
"CRITICAL"
"HIGH"
"MEDIUM"
"LOW"
"INFORMATIONAL"
for any informational-level findings or observations, for example of a "secure" flag missing from a cookie.
Not every request and response may have any indicators, be concise yet deterministic and creative in your approach.
The HTTP request and and response pair are provided below this line:
[request and response JSON below]
Burpference Prompt (formatted for readability)
The first thing I noticed was that the model responded slowly. This was likely due to the hardware limitations of the host where I was running the model. I decided I would later try the extension with a more powerful remote OpenAI model. The extension sends full requests and responses that will almost certainly contain sensitive information like credentials, session tokens, response data, etc. When performing a penetration test, maintaining the confidentiality of customer data is a high priority, and that makes using remote models that you do not have full control over a serious concern. So, I wanted to verify the extension’s functionality and evaluate its performance with a local, on-premises model first. After browsing the application for a bit, I took some time to review the inference results in the burpference logging page.

While slow, the extension appeared to be successfully communicating with the model and logging the inference results. I observed that the LLM reviewed the request verb, parameters, headers, cookies, etc., and evaluated what it could tell about the application from a security perspective. Ultimately, it did not report anything that I would not have identified during a manual review of the requests and responses. However, it did identify an interesting cookie valued called welcomebanner_status
that was set to dismiss
, and it even brainstormed a possible attack vector!

Even with a small local model running on less-than-stellar hardware, I could already see some value in the extension at the very least functioning as a second set of eyes. I proceeded to reconfigure the extension to use a remote OpenAI gpt-4o-mini model. As you might expect, I saw much better performance with the larger model. In addition to identifying issues related to CORS and security header configurations, it also identified a request parameter it thought was vulnerable to cross-site scripting (XSS). The model even provided a proof-of-concept payload.

I tried the proof-of-concept request in a browser. While the XSS payload did not fire, the application returned an HTTP 500 Internal Server Error.

Observing this error response through the eyes of an experienced web application tester, it seemed obvious that I should look for a SQL injection vulnerability here, but what about our AI assistant? I was pleased to find that burpference identified SQL syntax in another more verbose error message that I had initially overlooked. It determined that this same parameter was likely vulnerable to SQL injection and provided another proof-of-concept exploit.

I tried this PoC in a browser and the application responded with JSON containing all application product information. This was an indication that the payload was successful, and the application was vulnerable to SQL injection.

One thing I noticed while evaluating burpference is that the context for each inference request consisted of only a single request and response. I think this could be a limiting factor in the usefulness of the extension as it currently exists. The smaller local model’s responses plainly stated that it might be able to tell me more useful information if it was provided more context. I think there is likely an opportunity to extend the extension’s functionality to selectively send a series of requests and responses to the model in the same inference request to provide it with more useful context.
Overall, I found the extension useful as a second set of eyes looking over my web traffic, and it successfully put me down the pathway to discovering a valid vulnerability. I liked that it works passively in the background, and I can definitely see myself leveraging this extension with an on-premises in my web application penetration testing methodology. Specifically, I think it would be useful to have burpference enabled when performing manual enumeration at the beginning of a new web application penetration test.
Read part 2 of this series here: Part 2 – Copilot
Want to keep learning about this topic?
Register now for next week’s webcast taking place Thursday, May 22nd, at 1:00pm EDT:
Using AI to Augment Pentesting Methodologies
