To extract the content of a PDF in a React.js app, you can use the pdfjs-dist library, which provides functionality for working with PDF files. Here's an example of how you can achieve this:
- Install the required packages:
Start by installing the - pdfjs-distpackage using npm or yarn:
 - npm install pdfjs-dist
 
- Import the required modules in your component: - import { Document, Page } from 'react-pdf/dist/esm/entry.webpack';
import pdfjs from 'pdfjs-dist';
 
- Configure the PDF.js library:
Before loading the PDF file, you need to configure the - pdfjslibrary by setting the correct path to the worker file. You can do this in the component where you'll be working with PDF files:
 - pdfjs.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.js`;
 
- Load and extract content from the PDF:
In your component, you can load the PDF file and extract its content. Here's an example using a function component and hooks: - import React, { useState } from 'react';
const PdfExtractor = () => {
  const [numPages, setNumPages] = useState(null);
  const [pdfText, setPdfText] = useState('');
  const onDocumentLoadSuccess = ({ numPages }) => {
    setNumPages(numPages);
    // Extract text from each page
    const textPromises = [];
    for (let i = 1; i <= numPages; i++) {
      textPromises.push(
        pdfjs.getDocument({ url: 'path/to/pdf/file.pdf' })
          .then((pdf) => pdf.getPage(i))
          .then((page) => page.getTextContent())
          .then((textContent) => {
            const pageText = textContent.items.map((item) => item.str).join(' ');
            return pageText;
          })
      );
    }
    Promise.all(textPromises)
      .then((pageTexts) => {
        const extractedText = pageTexts.join(' ');
        setPdfText(extractedText);
      })
      .catch((error) => console.error('Failed to extract PDF text:', error));
  };
  return (
    <div>
      <Document
        file="path/to/pdf/file.pdf"
        onLoadSuccess={onDocumentLoadSuccess}
      >
        {Array.from(new Array(numPages), (el, index) => (
          <Page key={`page_${index + 1}`} pageNumber={index + 1} />
        ))}
      </Document>
      <div>{pdfText}</div>
    </div>
  );
};
export default PdfExtractor;
 - In the above example, replace - 'path/to/pdf/file.pdf'with the actual path or URL of your PDF file.
 - The - onDocumentLoadSuccessfunction is called when the PDF is successfully loaded. It extracts the text content from each page of the PDF and joins them together.
 - The extracted text is stored in the - pdfTextstate variable, which can be rendered within the component or used as needed.
 - The - Documentcomponent from- react-pdfis used to render the PDF pages, and the- Pagecomponent represents each individual page.
 
By following these steps, you can extract the content of a PDF in a React.js app using the pdfjs-dist library.
UPDATE:
To allow file selection using the <input> component, you can do as follows:
import { useState } from 'react';
import { PDFDocument } from 'pdfjs-dist';
function YourComponent() {
  const [pdfContent, setPdfContent] = useState('');
  const handleFileChange = async (event) => {
    const file = event.target.files[0];
    const reader = new FileReader();
    reader.onload = async (e) => {
      const contents = e.target.result;
      const pdf = await PDFDocument.load(contents);
      const pages = pdf.getPages();
      let extractedText = '';
      for (const page of pages) {
        const textContent = await page.getTextContent();
        const pageText = textContent.items.map((item) => item.str).join(' ');
        extractedText += pageText;
      }
      setPdfContent(extractedText);
    };
    reader.readAsArrayBuffer(file);
  };
  return (
    <div>
      <input type="file" onChange={handleFileChange} />
      <div>{pdfContent}</div>
    </div>
  );
}
export default YourComponent;