From LinkedIn Profiles to Career Paths: An LLM-Powered Recommendation System

System Architecture
System Architecture

Why Career Path Planning Matters in Data Science

The data science field evolves rapidly, with new technologies and roles emerging daily. Professionals often face critical questions:

  • Do my skills align with industry trends?
  • Should I focus on Python for deep learning or cloud platforms next?
  • What core competencies are needed for a career switch?

We developed an intelligent recommendation system that combines semantic analysis and topic modeling. By analyzing real LinkedIn job postings, it provides tailored career guidance for users at different stages. Below is a detailed breakdown of how this system works.


1. Data Collection: Scraping Real-Time Job Listings from LinkedIn

Technical Approach

We used Selenium with BeautifulSoup for web scraping instead of traditional APIs, offering three advantages:

  1. Captures full job descriptions (including HTML-formatted content)
  2. Bypasses certain anti-scraping mechanisms
  3. Handles dynamically loaded content
# Code snippet from GitHub repository
driver = webdriver.Chrome()
driver.get("https://linkedin.com/jobs")
soup = BeautifulSoup(driver.page_source, 'html.parser')
job_descriptions = [div.text for div in soup.find_all('div', class_='description__text')]

Data Cleaning Pipeline

  1. Basic Sanitization: Remove special characters and standardize capitalization
  2. Term Normalization: Convert abbreviations like “ML” to “machine learning”
  3. Stopword Removal: Filter generic terms like “experience”
  4. Lemmatization: Use spaCy to reduce words to root forms (e.g., “running” → “run”)

2. Topic Modeling: Uncovering Career Patterns with BERTopic

Why BERTopic Outperforms LDA

Traditional LDA models have limitations:

  • Requires predefined topic numbers
  • Performs poorly on short texts

BERTopic addresses these through:

  1. Sentence-BERT embeddings for semantic understanding
  2. UMAP for dimensionality reduction
  3. HDBSCAN for density-based clustering

Key Configuration Parameters

Parameter Value Purpose
n_grams (1,2) Capture phrases like “machine learning”
min_topic_size 15 Ensure meaningful cluster sizes
diversity 0.7 Balance keyword uniqueness
Topic Visualization
Topic Visualization

3. Intelligent Recommendations: Leveraging Google’s Gemini Model

System Workflow

  1. User inputs career background (e.g., “3 years of Python development with Pandas experience”)
  2. Gemini generates semantic embeddings
  3. Matches to relevant BERTopic clusters
  4. Outputs personalized advice:

    • Recommended job roles
    • Skill development priorities
    • Learning resources

Real-World Example

Input:

“Currently analyzing sales data using Excel and SQL. Seeking transition to data engineering.”

System Output:

  1. Target Role: Junior Data Engineer
  2. Learning Path:

    • Phase 1: Master Python fundamentals (2 weeks)
    • Phase 2: Learn Airflow workflow management (1 month)
    • Phase 3: Obtain cloud platform certification (AWS/Azure)
  3. Recommended Course: DataCamp’s Python for Data Engineering

4. Deployment: Building an Interactive Interface with Streamlit

Core Functionality

graph TD
    A[User Input] --> B(Semantic Analysis)
    B --> C{Topic Matching}
    C -->|Match Found| D[Generate Recommendations]
    C -->|No Match| E[Expand Topic Library]
    D --> F[Visualize Results]

UI Design Principles

  1. Simplicity: Single text input with <3s response time
  2. Progressive Disclosure: Show core suggestions first
  3. Transparency: Display matching rationale via word clouds
Interface Demo
Interface Demo

5. Frequently Asked Questions (FAQ)

Q1: How detailed should my career summary be?

Provide 200-500 words covering:

  • Primary responsibilities
  • Technical tools used
  • Key projects undertaken

Q2: Who benefits most from this system?

  • Graduates: Clarify career directions
  • Professionals: Plan promotions
  • Career switchers: Identify skill gaps

Q3: How is my data protected?

All inputs are processed in real-time without storage. Compliant with ISO 27001 security standards.


6. Try It Now


Technical Deep Dive: Evolution of Topic Modeling

While external knowledge isn’t incorporated, it’s worth noting that from LSA to BERTopic, topic modeling consistently follows one principle: uncovering latent semantic structures through mathematical methods. This technology also powers news categorization and academic research.


Future Enhancements

  1. Multilingual Support: Add Chinese and other languages
  2. Real-Time Updates: Daily automated data scraping
  3. Personalization: Enable user feedback loops