I have the following dictionary:
ip_dict = 
{
    "doc_1" : {
                "img_1" : ("FP","some long text"),
                "img_2" : ("LP", "another long text"),
                "img_3" : ("Others", "long text"),
                "img_4" : ("Others", "some loong text"),
                "img_5" : ("FP", "one more text"),
                "img_6" : ("FP", "another one"),
                "img_7" : ("LP", "ANOTHER ONE"),
                "img_8" : ("Others", "some text"),
                "img_9" : ("Others", "some moretext"),
                "img_10" : ("FP", "more text"),
                "img_11" : ("Others", "whatever"),
                "img_12" : ("Others", "more whatever"),
                "img_13" : ("LP", "SoMe TeXt"),
                "img_14" : ("Others", "some moretext"),
                "img_15" : ("FP", "whatever"),
                "img_16" : ("Others", "whatever"),
                "img_17" : ("LP", "whateverrr")
            },
    "doc_2" : {
                "img_1" : ("FP", "text"),
                "img_2" : ("FP", "more text"),
                "img_3" : ("LP", "more more text"),
                "img_4" : ("Others", "some more"),
                "img_5" : ("Others", "text text"),
                "img_6" : ("FP", "more more text"),
                "img_7" : ("Others", "lot of text"),
                "img_8" : ("LP", "still more text")
            }
}
Here FP represents the first page and LP the last page. For all the docs I only want to extract the FP and LP. For the Others, if they lie between FP and LP only then extract them, as they represent the pages between FP and LP. If they lie outside FP and LP then ignore them. Also for FP which are not followed by a LP, treat them as a single page and extract them. So my output dictionary would look like:
op_dict = 
{
    "doc_1" : [
                {
                "img_1" : ("FP","some long text"),
                "img_2" : ("LP", "another long text")
                },
                {
                    "img_5" : ("FP", "one more text")
                },
                {
                    "img_6" : ("FP", "another one"),
                    "img_7" : ("LP", "ANOTHER ONE")
                },
                {
                    "img_10" : ("FP", "more text"),
                    "img_11" : ("Others", "whatever"),
                    "img_12" : ("Others", "more whatever"),
                    "img_13" : ("LP", "SoMe TeXt"),
                },
                {
                    "img_15" : ("FP", "whatever"),
                    "img_16" : ("Others", "whatever"),
                    "img_17" : ("LP", "whateverrr"),
                }
            ],
    "doc_2" : [
                {
                "img_1" : ("FP", "text")
                },
                {        
                "img_2" : ("FP", "more text"),
                "img_3" : ("LP", "more more text")
                },        
                {
                "img_6" : ("FP", "more more text"),
                "img_7" : ("Others", "lot of text"),
                "img_8" : ("LP", "still more text")
                },
            ]
}
As you can see, all the FP and LP have been extracted, but also those Others which are in between FP and LP have also been extracted and stored in a dictionary. Also those FP which are not followed by a LP have also been extracted.
PS:
ip_dict = 
{
    "doc_1" : {
                "img_1" : ("LP","some long text"),
                "img_2" : ("Others", "another long text"),
                "img_3" : ("Others", "long text"),
                "img_4" : ("FP", "long text"),
                "img_5" : ("Others", "long text"),
                "img_6" : ("LP", "long text")
            }
}
op_dict =     {
        "doc_1" : [{
                    "img_1" : ("LP","some long text")
                },
                    {
                    "img_4" : ("FP", "long text"),
                    "img_5" : ("Others", "long text"),
                    "img_6" : ("LP", "long text")
                    }
                  ]
    
              }
Any help is appreciated!
 
     
     
     
    