AutoCimKG: Automatic Construction and Incremental Maintenance of Knowledge Graphs

In a world overflowing with data, organizations face the daunting task of organizing and understanding vast amounts of information. Whether it’s tracking employee skills, mapping research expertise, or connecting documents to their authors, making sense of it all can feel overwhelming. Knowledge Graphs (KGs) offer a solution by structuring information into a network of connected entities—think of it as a map that shows how people, skills, and documents relate to one another. But building and updating these graphs manually is time-consuming and impractical, especially as data keeps growing.

That’s where AutoCimKG comes in. This innovative Python module automates the creation and upkeep of Knowledge Graphs, with a focus on tracking experts and their competencies. Developed as part of a master’s thesis at Johannes Kepler University Linz, AutoCimKG uses advanced tools like large language models (LLMs) from OpenAI and the LangChain framework to turn unstructured text—like reports or resumes—into a clear, organized KG. It’s designed to evolve with new data, making it perfect for dynamic environments like businesses, universities, or government agencies.

In this blog post, we’ll explore what AutoCimKG is, how it works, and why it’s a game-changer for managing knowledge. We’ll also walk through installation steps, show you how to use it, and highlight real-world applications—all explained in a way that’s easy to grasp, whether you’re a student, a professional, or just curious about technology. Let’s dive in!


What Exactly is a Knowledge Graph?

Imagine a giant web where every dot represents something specific—like a person, a skill, or a document—and every line shows how they’re connected. That’s a Knowledge Graph in a nutshell. Each dot (called a node) and each line (called an edge) builds a picture of relationships. For example:

  • Node 1: An expert named Sarah Lee
  • Node 2: A skill like “Web Development”
  • Edge: “possesses” (Sarah Lee possesses Web Development skills)

This setup doesn’t just organize data—it reveals insights, like which experts share skills or how documents link to certain topics. AutoCimKG specializes in creating KGs that focus on four key areas: experts, their competencies, the documents they’ve written, and the organizational units (like departments or teams) they’re part of. It’s like a living directory of knowledge that grows as you feed it more information.


What Makes AutoCimKG Special?

AutoCimKG isn’t just another tech tool—it’s packed with features that make managing knowledge easier, faster, and smarter. Here’s what it can do:

1. Builds Knowledge Graphs Automatically

Got a pile of reports, articles, or resumes? AutoCimKG can read through them and pull out the important stuff:

  • Experts: Who wrote or contributed to the text.
  • Competencies: Skills or knowledge areas mentioned, like “Python” or “Financial Analysis.”
  • Documents: Info about the text itself, like its title or date.
  • Organizational Units: Teams or departments tied to the experts.

Using smart language analysis (powered by OpenAI’s GPT-4o), it figures out how these pieces connect. For example, if a paper mentions “cloud computing” and lists Mark Jones from the IT department as the author, AutoCimKG creates nodes for Mark, the skill “cloud computing,” and the IT department, then links them with edges like “possesses” or “belongs to.”

2. Updates Without Starting Over

Data changes all the time—new reports get written, employees gain skills, teams shift. AutoCimKG doesn’t make you rebuild the whole graph every time something new comes up. Instead, it adds the fresh info right into the existing structure. This incremental update feature saves time and keeps your KG current, no matter how fast your data grows.

3. Keeps Things Accurate with Entity Resolution

Ever wonder if “M. Jones” and “Mark Jones” are the same person? AutoCimKG uses text embeddings (via OpenAI’s text-embedding-3-large model) to spot duplicates and clarify relationships. It ensures your graph doesn’t get cluttered with repeats and even finds hidden connections, like two experts working on similar topics.

4. Works with Databases

AutoCimKG pairs with a PostgreSQL/Apache AGE database to store your Knowledge Graph. This lets you save different versions of the graph and run detailed queries. Want to know “Which experts in sales know data visualization?” Just ask the database—it’s that simple.

5. Lets You Customize the Structure

Every organization is different, so AutoCimKG lets you tailor the graph to fit your needs with a lightweight ontology. You can:

  • Set Subject Areas: Group skills into categories, like “Programming” covering Python and Java.
  • Adjust Strictness: Decide if you want to stick to specific skills or let the system suggest new ones.
  • Define Relationships: Choose how nodes connect, like “collaborates with” or “authored.”

This flexibility makes sure the KG matches your goals, whether you’re tracking a small team or a huge company.

6. Tracks All the Details

AutoCimKG doesn’t just build the graph—it keeps a record of everything behind it. This includes metadata like where the data came from, the graph’s version, system logs, and settings. Stored in the database, this info acts like a history log, so you can trust the graph’s accuracy and trace its evolution.


Where Can You Use AutoCimKG?

AutoCimKG was first built for the Austrian Financial Market Authority to track employee skills, but its uses go way beyond that. Here are some examples:

  • Businesses: Map out employee expertise and project history to assign the right people to the right tasks.
  • Universities: Connect researchers, their papers, and their skills to boost collaboration.
  • Government: Monitor who’s contributing what to guide decisions or planning.

Any group dealing with lots of documents or needing to manage expertise can put AutoCimKG to work. It’s all about turning messy data into something useful.


How to Get Started with AutoCimKG

Ready to give it a try? Here’s everything you need to set up and start using AutoCimKG. Don’t worry—we’ll keep it straightforward.

Installation Steps

  1. Get the Code
    Head to the AutoCimKG GitHub repository, then clone or download it to your computer. Add it to your Python project folder.

  2. Prepare Your Setup

    • Use Python 3.9 (it works best with this version).
    • Install the required tools listed in the requirements.txt file, like LangChain and database connectors. You can do this with a command like pip install -r requirements.txt.
  3. Set Up Language Models

    • Sign up for an API key on OpenAI’s developer site.
    • Configure access to GPT-4o (for understanding text) and text-embedding-3-large (for comparing entities).
  4. Add a Database (Optional)

    • Install PostgreSQL and Apache AGE on your system.
    • Use tools like psql (command line) or pgAdmin (visual interface) to manage it.
  5. Pick Your Tools

    • PyCharm is great for organizing your project.
    • Jupyter Notebook is perfect for testing and experimenting step-by-step.

How to Use It: A Simple Example

AutoCimKG comes with a tutorial to guide you. Here’s the basic flow:

  • Step 1: Load Your Files
    Gather some documents—like employee resumes or research papers—and feed them into AutoCimKG.
  • Step 2: Define Your Rules
    Set up your ontology by listing skill categories (e.g., “Design,” “Analytics”) and how things should connect.
  • Step 3: Create the Graph
    Run the code to build your first Knowledge Graph. It’ll process the text and link everything together.
  • Step 4: Ask Questions
    Use SQL or Cypher queries to explore the database. Try something like, “Show me all engineers with AI skills.”
  • Step 5: Add More Data
    Drop in new files anytime—AutoCimKG will update the graph without missing a beat.

For example, you could process a batch of team reports and see how AutoCimKG maps out who knows what and where they work. It’s a hands-on way to see the tool in action.


The Origins of AutoCimKG

AutoCimKG didn’t start from scratch—it’s an evolution of the iText2KG library (version 0.0.7). Gerhard Lerch, a master’s student at Johannes Kepler University Linz, took it further by adding incremental updates, metadata tracking, and more. His thesis, “Automatic Construction and Incremental Maintenance of Knowledge Graphs: Encoding Employee Competencies in the Case of the Austrian Financial Market Authority,” will be published in 2025. The project’s open-source under the GNU Lesser General Public License (LGPL-2.1)—check the LICENSE.txt file for details.


Why AutoCimKG Matters

So, why should you care about AutoCimKG? It’s all about making life easier and unlocking insights from your data. Here’s what sets it apart:

  • Saves Time: No more manually sorting through documents—it does the heavy lifting for you.
  • Grows with You: Incremental updates mean it can handle more data without starting over.
  • Shows the Big Picture: A clear, queryable graph helps you spot trends, gaps, or opportunities—like finding the perfect expert for a project.

Picture this: You’re a manager needing someone with cybersecurity expertise. With AutoCimKG, you query the graph and discover “Lisa from IT has written three papers on cybersecurity.” That’s info you might’ve missed otherwise, delivered in seconds.


Digging Deeper: How AutoCimKG Works

Let’s peel back the layers a bit (don’t worry, we’ll keep it simple). AutoCimKG combines powerful tech to turn raw text into a structured graph:

  • Large Language Models (LLMs): GPT-4o reads the text and picks out key details—like who’s an expert and what they know.
  • Text Embeddings: The text-embedding-3-large model compares names and terms to avoid duplicates and find links.
  • LangChain Framework: This ties everything together, making the process smooth and efficient.
  • Database Storage: PostgreSQL/Apache AGE keeps the graph safe and searchable.

When you feed it a document, AutoCimKG scans for entities (people, skills, etc.), figures out how they relate, and builds or updates the graph. It’s like having a super-smart assistant organizing your data 24/7.


Real-World Scenarios: AutoCimKG in Action

Let’s see how AutoCimKG tackles real challenges:

  • Scenario 1: Corporate HR
    A company uploads employee profiles and project reports. AutoCimKG maps out who’s skilled in “machine learning” and spots a gap—no one in sales knows it. Time to train someone!
  • Scenario 2: University Research
    A school processes faculty papers. The graph shows Professor Chen and Dr. Patel both work on “renewable energy,” sparking a new collaboration.
  • Scenario 3: Government Planning
    An agency tracks staff contributions to policy reports. AutoCimKG reveals who’s leading on “climate policy,” aiding resource allocation.

These examples show how AutoCimKG turns data into decisions, no matter the field.


Tips for Success with AutoCimKG

Want to get the most out of it? Here’s some advice:

  • Start Small: Test it with a few documents to get the hang of it before scaling up.
  • Keep Your Ontology Clear: Define your categories and rules upfront to avoid confusion later.
  • Check Your Data: Clean up messy text (like typos) for better results.
  • Explore the Database: Play with queries to uncover insights you didn’t expect.

It’s user-friendly, but a little prep goes a long way.


The Future of Knowledge Management

AutoCimKG is more than a tool—it’s a glimpse into how we’ll handle information moving forward. As data keeps piling up, tools like this will be essential for staying organized and informed. It’s not about replacing people; it’s about empowering them with better ways to see and use what they know.

Whether you’re a business leader, a researcher, or just someone who loves smart tech, AutoCimKG offers a practical way to tame the data chaos. Download it from GitHub, follow the tutorial, and start building your own Knowledge Graph today. You might be surprised at what you discover.