How to follow this process

  1. Set up the development environment
    Install Python and use an IDE such as PyCharm or VS Code.

  2. Create a project folder
    Organize the project with folders for source documents, extracted JSON, chunks, and final test files.

  3. Add source documents
    Place sample Word, PDF, Excel, text, Markdown, or CSV files in the dataset folder.

  4. Run the Python pipeline
    The pipeline scans the files, extracts text, and prepares the content for search.