I have a system that reads a large file on an FTP server, stores it in a database, and sends the data to an API that processes it. The data can be hundreds of thousands of registers, which takes a long time to process, so you need to chunk the data. That processing is done in jobs, so I batched these jobs to figure out when a file was done and continue processing the next one.
The key here is that I was asked to automate the process of checking if there is a new file, so the scheduler does that check and kicks off the lengthy process.
I ran the task and programmed it to run every 5 minutes, but the previous job takes a long time. I know it won’t wait until the first task finishes. It didn’t work. I don’t know if there is a way to achieve that.
It sounds like you are using a task scheduler to check for new files and start the data processing jobs. However, you are running into issues where the previous job may still be running when the scheduler starts the next job, leading to overlapping and potential data processing errors.
One solution to this issue would be to use a file lock to prevent the scheduler from starting a new job while the previous one is still running. When the processing job starts, it acquires a file lock on a specific file, and the scheduler checks if this file lock is still in place before starting a new job. If the lock is still there, the scheduler waits until the lock is released before starting the next job.
Another solution could be to use a queue system to manage the data processing jobs. Instead of starting the jobs directly from the scheduler, you could enqueue them into a queue system, such as RabbitMQ or Apache Kafka. The processing jobs would then be picked up by worker processes, which can be scaled up or down as needed to handle the volume of incoming jobs. This way, you can ensure that jobs are processed in the order they were received, without the risk of overlapping or errors.
Overall, using either a file lock or a queue system can help you automate the process of checking for new files and starting the data processing jobs, while also ensuring that the jobs are executed in a controlled and safe manner.
Answered By – Derek
Answer Checked By – David Goodson (Easybugfix Volunteer)