Show HN: JavaFactory – IntelliJ plugin to generate Java code
github.comHi HN,
I built a code generator plugin for IntelliJ that uses LLMs to create repetitive Java code like implementations, tests, and fixtures — based on custom natural-language patterns and annotation-based references.
Most tools like Copilot or Cursor aim to be general, but fail to produce code that actually fits a project structure or passes tests.
So I made something more explicit: define patterns + reference scope, and generate code consistently.
In this demo, 400 lines of Java were generated in 20 seconds — and all tests passed: https://www.youtube.com/watch?v=ReBCXKOpW3M
GitHub: https://github.com/JavaFactoryPluginDev/javafactory-plugin
A side comment, I have found that configuring a few live templates in IntelliJ helps me to write a lot of the repetitive code just a handful of keystrokes regardless of the language.
Structural refactoring is another amazing feature that is worth knowing.
I've also got some mileage from live templates for repetitive code. However, at some point I built[0] an IntelliJ IDEA plugin to help me generate setters and field assignments that I felt live templates weren't a good solution for (for my case). I don't know if JavaFactory solves this kind of problem, keen to try it out.
[0]: https://github.com/nndi-oss/intellij-gensett
I think IntelliJ is a great tool on its own. Recently, they even added a feature that auto-injects dependencies when you declare them as private final — super convenient.
I can’t help but wonder if the folks at JetBrains are starting to feel a bit of pressure from tools like Cursor or Windsurf
Feels very Java like. Factories, repositories, utils, patterns etc. Good stuff.
thank you. i think this tool have really room to grow, but still concept of manipulate each task is quite usefule
yoDawgMemesFactory
If the trend continues a program will look like "JavaFactory("<prompt>").compile().run();".
I've always wondered how long until we reach this. If every pc can run models locally, with a given seed and prompt it could be the ultimate compression. It's also hilarious.
Although very lossy compression, each invocation will be different, so that will inevitably circle back to "strong-static-LLM" prompts. What? wait..!
LLMs at their core do produce reproducible results with a given seed. it's all the workflow stuff people do on top that tends to break reproducibility.
This is not the case for LLMs running on GPUs (which is most of them); GPUs are non-deterministic for this use-case due to the floating point math involved. there is no way to get perfectly deterministic output from OpenAI despite the presence of seed and temperature parameters.
Thank you — I’ll consider adding that feature.
Actually, I'm currently thinking about creating a small community for sharing pattern definitions.
Do you already have some common templates ready to be used somewhere?
What LLM is it using? Is it something local? Or does it call out? It wasn't obvious from the docs, and I didn't want to dig through all of the code to figure it out. Should probably be clearly stated on the front page.
But the project looks interesting, I have been looking for something similar.
This uses OpenAI's GPT-4o model.
The requests involve relatively long context windows, which require high-quality reasoning. Only recent high-performance models like GPT-4o or Claude Sonnet are capable of reducing the manual workload for this kind of task.
It uses openai.
As a programmer I feel bad if tests don't fail at the first run... It might show that they are not testing...
Related to this, consider that when an LLM writes tests for code, it's writing them based on what the code actually does, not what it's supposed to do. This is equally true when the code itself was written by the LLM. Sure the tests pass, but that doesn't prove the code is correct.
Your point is valid. In real-world work, tests should focus on parts that are difficult to verify, and if everything passes on the first try, it's often a sign that something deserves a closer look.
That said, what I wanted to highlight in the example was a contrast — tools like Cursor and other general-purpose models often fail to even generate simple tests correctly, or can't produce tests that pass. So the goal was to show the difference in reliability.
The guide is a 404.
"404 - page not found The
master branch of
javafactory-plugin does not contain the path
docs/how-to-use.md."
How do I hook it into local models? Does it support Ollama, Continue, that kind of thing? Do you collect telemetry?
1. Im sorry. i it was typo on path, i fixed it so you can see now.
2. from now, i only allow to use gpt-4o, because the requests involve relatively long context windows, which require high-quality reasoning. Only recent high-performance models like GPT-4o or Claude Sonnet are capable of reducing the manual workload for this kind of task.
___
but still, if user want to use other models , i can make adapter features for various models
Thanks.
Right, so it can't be used on proprietary code or in settings where personal data might occur.
That's right. Unfortunately, the system currently forces the use of GPT-4o.
To be honest, I didn’t realize that model selection would be such an important point for users. I believed that choosing a high-quality model with strong reasoning capabilities was part of the service’s value proposition.
But lately, more users — including yourself — have been asking for support for other models like Claude Sonnet or LLaMA.
I’m now seriously considering adding an adapter feature. Thank you for your feedback — I really appreciate it.
I can't speak for other people but I regularly work with code that is not owned by my organisation and getting approval to send it out to some remote, largely unaccountable, corporation is likely to be impossible under the conditions which we operate.
Together with the CEO I've also decided that we do not do this with our own code, it stays on machines we control until someone pays for some artifact we'd like to license.
I'm well aware that many other organisations take a different position and push out basically everything they work on to SaaS LLM:s, in my experience defending it with something about so called productivity and something about some contract clause about the SaaS pinky promising to not straight up take the code. But nothing stops them from running hidden queries against it with their in-house models parallel with providing their main service, and sift out a lot of trade secrets and other goodies from it.
It's also likely these SaaS corporations can benchmark and otherwise profile individual developers, information that would be very valuable to e.g. recruiting agencies.
And I work for an organization that does everything they can think of to make it virtually impossible for anyone to leak code outside, but is now mandating Copilot use to the point of including it in personal performance goals.
Sounds like it might be a good idea to scout for a new gig. When management is acting incoherently it usually doesn't take long for employee churn to pick up.
If you can tell, how is that Copilot performance measured?
[flagged]