The size to chunk the sitemap URLs into for scraping.
OptionalheadersThe headers to use in the fetch request.
OptionalselectorThe selector to use to extract the text from the document. Defaults to "body".
OptionaltextThe text decoder to use to decode the response. Defaults to UTF-8.
The timeout in milliseconds for the fetch request. Defaults to 10s.
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.
A Promise that resolves to an array of Document instances.
Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.
StaticimportsA static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.
A Promise that resolves to an object containing the load function from the Cheerio library.
StaticscrapeFetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
An array of URLs to fetch and load.
OptionaltextDecoder: TextDecoderOptionaloptions: CheerioOptions & { A Promise that resolves to an array of CheerioAPI instances.
Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams