I need to import function from different python scripts, which will used inside preprocessing.py file. I was not able to find a way to pass the dependent files to SKLearnProcessor Object, due to which I am getting ModuleNotFoundError.
Code:
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
                                     role=role,
                                     instance_type='ml.m5.xlarge',
                                     instance_count=1)
sklearn_processor.run(code='preprocessing.py',
                      inputs=[ProcessingInput(
                        source=input_data,
                        destination='/opt/ml/processing/input')],
                      outputs=[ProcessingOutput(output_name='train_data',
                                                source='/opt/ml/processing/train'),
                               ProcessingOutput(output_name='test_data',
                                                source='/opt/ml/processing/test')],
                      arguments=['--train-test-split-ratio', '0.2']
                     )
I would like to pass,
dependent_files = ['file1.py', 'file2.py', 'requirements.txt']. So, that preprocessing.py have access to all the dependent modules.
And also need to install libraries from requirements.txt file.
Can you share any work around or a right way to do this?
Update-25-11-2021:
Q1.(Answered but looking to solve using FrameworkProcessor)
Here, the get_run_args function, is handling dependencies, source_dir and code parameters by using FrameworkProcessor. Is there any way that we can set this parameters from ScriptProcessor or SKLearnProcessor or any other Processor to set them?
Q2.
Can you also please show some reference to use our Processor as sagemaker.workflow.steps.ProcessingStep and then use in sagemaker.workflow.pipeline.Pipeline?
For having Pipeline, do we need sagemaker-project as mandatory or can we create Pipeline directly without any Sagemaker-Project?