Large file slicing and stitching

Published: May 23, 2024

Time to read: 12 min

Large file slicing and stitching

Published: May 23, 2024

Time to read: 12 min

In the modern age, there is no need for these technique since the majority strategy of handling large file upload is already handled by blob storage services like Amazon S3, Google Cloud Storage, and Azure Blob Storage. These services are designed to handle large files and already have methods to handle large file uploads.

With that being said I wanted to try and attempt to handle large file uploads locally on the backend, AGAIN not the smartest but I wanted the challenge of breaking down a large file and piecing it back together.

Tech stack

Frontend

  • TS React
  • Vite
  • Axios

Backend

For this example I was using Hono on top of Bun.JS but any JavaScript run time should work.

The Plan

  1. Break down the file on the frontend
  2. Send the file in chunks to the backend
  3. Save the chunks to the server

The Breakdown

The first step is to break down the file on the frontend. This is done by reading the file as an array buffer and then breaking it down into chunks. The chunk size is determined by the user and can be adjusted to fit the needs of the application. In my case I had set the MAX_UPLOAD_SIZE to 100MB.

code block is ts
private readonly MAX_UPLOAD_SIZE = 100 * 1024 * 1024; // 100MB 
code block is ts
public startUploading = async (files: FileList | File[], ) => {
    if (files) {
      return [...files].map((file) =>
        file.size <= this.MAX_UPLOAD_SIZE
          ? this.uploadFile(file)
          : this.uploadLargeFile(file),
      );
    }
  };

For files up to 100MB, the file is uploaded as a single chunk.

  1. Check if file already exist using file system API
  2. Create a form data object and append the file to it
  3. Send the file to the server
code block is ts
private uploadFile = async (file: File, fileName: string = file.name) => {
    if (await this.doesFileExist(file.name)) return;
    const formData = new FormData();
    formData.append(fileName, file);

    try {
      const controller = new AbortController();
      const res = await axios.postForm(
        CLIENT_UPLOAD_ENDPOINT,
        formData,
        this.getConfig(file, controller),
      );
      console.log(res.data);
      return file;
    } catch (err) {
      console.error(err);
    }
  };

getConfig is a helper function that returns the headers, handles the progress status, and the cancel token for the request.

code block is ts
private getConfig = (file: File, controller: AbortController) => ({
    headers: {
      'Content-Type': 'multipart/form-data',
    },
    ...this.handleProgress(file.name, file.size),
    signal: controller.signal,
  });

For files larger than 100MB, the file is broken down into chunks and uploaded to the server.

  1. Check if file already exist using file system API
  2. Create a chunk size based on the file size and the max upload size.
  3. Create a starting point to use in a while loop.
  4. For each iteration of the loop,
    • Create a new form data object
    • Append the name file with the file sliced from the start to the end, followed by the file name
    • Append the start and end points to the form data
    • Append the file name to the form data
    • Append a boolean value named fileIsLastChunk to the form data to indicate if the file is the last chunk
  5. In a try-catch block, send the form data to the server
  6. If the server responds with 'Uploaded large files', return the file
  7. If the server responds with an error, check if the retry count is less than the maximum retry count
  8. Create a uploadedBytes map to keep track of the uploaded slices for each file
  9. Exit of the loop if the start is greater than the file size
code block is ts
private uploadLargeFile = async (
    file: File,
    retryCount = 0,
    controller = new AbortController(),
  ) => {
    const MAX_RETRY_COUNT = 3; // Define your maximum retry count

    if (file) {
      if (await this.doesFileExist(file.name)) return;
      const chunkSize = Math.max(
        this.MAX_UPLOAD_SIZE,
        Math.ceil(file.size / 1000),
      );

      let start = 0;

      while (start < file.size) {
        const end = Math.min(start + chunkSize, file.size);
        const formData = new FormData();
        formData.append('file', file.slice(start, end), file.name);
        formData.append('start', start.toString());
        formData.append('end', end.toString());
        formData.append('fileName', file.name);
        formData.append(
          'fileIsLastChunk',
          end === file.size ? 'true' : 'false',
        );

        try {
          const controller = new AbortController();
          const res = await axios.postForm(
            `${CLIENT_UPLOAD_ENDPOINT}-large`,
            formData,
            this.getConfig(file, controller),
          );

          if (res.data === 'Uploaded large files') {
            console.log(res.data);
            return file;
          }
        } catch (err) {
          if (retryCount < MAX_RETRY_COUNT) {
            console.log(`Retry count: ${retryCount + 1}. Retrying...`);
            await this.uploadLargeFile(file, retryCount + 1, controller);
          } else {
            throw err;
          }
        }
        const uploadedBytes = this.uploadedBytesPerFile.get(file.name) || 0;
        this.uploadedBytesPerFile.set(file.name, uploadedBytes + end - start);
        start = end;
      }
    }
  };

The Backend

This method is used to write files to the server. It can handle both large and small files. It first checks if the file already exists on the server. If it does, it removes the file from the filesArray and returns a message indicating that the file already exists. If the file does not exist, it checks if the file is a large file. If it is, it converts the file data to a Uint8Array and writes it to the server. If the file is not a large file, it writes the file data directly to the server. After all files have been processed, it returns a message indicating that the files have been uploaded.

  • param CustomContext - The context object which includes the request and response objects.
  • param boolean [isLargeFile] - A flag indicating whether the file is a large file.
  • returns Promise<Response> - A promise that resolves to a response object. The response object includes a message indicating the result of the file upload operation.
code block is ts
  public writeFiles = async <P extends string>(
    c: CustomContext<P>,
    isLargeFile?: boolean,
  ): Promise<Response> => {
    const uploadPath = Bun.env.UPLOAD_PATH || API_UPLOAD_PATH;

    const filesArray = this.getFilesArray(await c.req.parseBody());
    const doesFileExist = async (filePath: string) =>
      await Bun.file(filePath).exists();

    if (isLargeFile) {
      for await (const file of filesArray) {
        const filePath = `${uploadPath}/${file.name}`;

        if (await doesFileExist(filePath)) {
          filesArray.splice(filesArray.indexOf(file), 1);
          break;
        } else if (file.data instanceof File) {
          const byteArray = new Uint8Array(await file.data.arrayBuffer());
          const filePath = `${API_UPLOAD_PATH}/${file.fileName}`;
          const dirExist = (path: string) =>
            !!Array.from(new Bun.Glob(path).scanSync({ onlyFiles: false }))[0];

          if (!dirExist(filePath)) {
            const dir = path.dirname(filePath);
            await mkdir(dir, { recursive: true });
          }
          await appendFile(filePath, byteArray);
        }
      }
      const lastChunk = filesArray.filter(
        (file) => file.name === "fileIsLastChunk" && file?.data === "true",
      )[0];

      if (lastChunk?.data === "true") {
        return c.text("Uploaded large files");
      } else {
        return c.text("Uploading ...");
      }
    } else {
      for await (const file of filesArray) {
        const filePath = `${API_UPLOAD_PATH}/${file.fileName}`;

        if (await doesFileExist(filePath)) {
          filesArray.splice(filesArray.indexOf(file), 1);
          break;
        } else {
          await Bun.write(filePath, file.data);
        }
      }
      return c.text("Uploaded files");
    }
  };

getFilesArray is a helper function that converts the files object to an array of CustomFileType objects. Each object includes the file name, data, and type.

code block is ts
private getFilesArray = (files: BodyData): CustomFileType[] =>
    Object.keys(files).map((fileName) => {
      const file = files[fileName];

      if (file instanceof File) {
        return {
          name: fileName,
          data: file,
          type: getFileFormat(file.type),
          fileName: file.name,
        };
      }

      return {
        name: fileName,
        data: file,
        type: "unknown",
      };
    });
}