14

How could I sort email senders (distinct "from" addresses) by the total amount of space each sender's cumulative messages (including bodies and attachments) are occupying in my Gmail account? I'm thinking something like:

  1. Download all my messages to a local client such as Thunderbird via IMAP
  2. Somehow convert the email data into a database that can be queried
  3. Use something like SQL to query the database and get total message size per sender

Can anyone help me flesh this out a bit?

3 Answers3

13

You can do it in Thunderbird without external software.

Step 1: Get the data

Either go into Google Takeout (https://takeout.google.com/) and download a complete copy of all your Gmail messages. You'll get everything in MBOX format which you can then import into the client.

OR, configure the client to directly download the messages using IMAP.

Step 2: Group the Emails

Go to your Inbox, select in the upper right corner the icon for Message Display List Options and select Table View

Select the same icon again, and in the Sort By option, select From

Finally, Select the same icon again, and in the Sort By option, select Grouped by Sort

Step 3: Review the Data

Click on each group. On the message preview side, at the bottom, you'll get a prompt with the accumulated size of that group of emails.

For example, it will say something like "These messages take up 4.7 KB."

3

I use a script to do basically that. While it's not directly shareable on superuser, here are the basic steps it performs:

  • Connect to the server through IMAP¹
  • SELECT a given folder²
  • SEARCH emails from a given range ((SINCE "<date>") (NOT DELETED))
  • FETCH for each email³ the fields (RFC822.SIZE BODY.PEEK[HEADER.FIELDS (FROM)])
  • Group the results by source, summing the sizes
  • Sort and print the result

For a one-off run you could even do that manually on a single session (that you are copying to a file for later parsing). Remember that IMAP requires you to prefix every command with an (arbitrary) tag,⁴ and that the reply will have that same tag, but there may be intermixed untagged responses as well.

The session may look like this:

* OK Gimap ready for requests
x1 LOGIN myusername mypassword
x1 OK LOGIN completed.
x2 SELECT INBOX
* 1234 EXISTS
* 11 RECENT
* FLAGS (\Seen \Answered \Flagged \Deleted \Draft)
* OK [PERMANENTFLAGS (\Seen \Answered \Flagged \Deleted \Draft)] Permanent flags
* OK [UNSEEN 1203] Is the first unseen message
* OK [UIDVALIDITY 14] UIDVALIDITY value
* OK [UIDNEXT 65530] The next unique identifier value
x3 OK [READ-WRITE] SELECT completed.
x4 UID FETCH 1:65530 (RFC822.SIZE BODY.PEEK[HEADER.FIELDS (FROM)])
* 1 FETCH (RFC822.SIZE 9999 BODY[HEADER.FIELDS (FROM)] {43}
From: "Jon Postel" <jpostel@example.com>

UID 3522)

  • 2 FETCH (RFC822.SIZE 9989 BODY[HEADER.FIELDS (FROM)] {76}

From: "Super user question" <superuserquestion1826542@invalid-email.com>

UID 7476)

  • 3 FETCH (RFC822.SIZE 9889 BODY[HEADER.FIELDS (FROM)] {70}

From: "Stack Exchange" <question1826542superuser@invalid-email.com>

UID 8084) ... x5 LOGOUT

  • BYE Logout Requested

Note: The email sizes above are 9999, 9989 and 9889. The UID FETCH on the range 1:65530 is based on UIDNEXT being 65530.

¹ Easiest way in Gmail today would be to use an application password. That's currently only available with 2FA enabled.

² In your case you would be interested in the special "All mails" “folder” of Gmail. Use one with just a few items for testing.

³ Actually in batches of 100 emails.

⁴ x1, x2, x3, x4, x5 on the sample

Ángel
  • 1,393
0

The way I did it was via Outlook. I have my Gmail account connected to outlook, so I ran the VBA code below, and it created an excel sheet with all email messages, which I then ran a pivot table to find all sizes and counts:

Sub ExportEmailDetailsToExcel()
    Dim accountName As String
    Dim folderName As String
    Dim olNs As Outlook.NameSpace
    Dim targetAccountFolder As Outlook.Folder
    Dim targetFolder As Outlook.Folder
    Dim itm As Object
    Dim mail As Outlook.MailItem
    Dim senderAddress As String
    Dim emailDate As Variant
    Dim emailSize As Long
    Dim totalItems As Long
    Dim i As Long
    Dim rowIndex As Long
' Prompt the user for the account name
accountName = InputBox(&quot;Enter the name of the account you want to scan:&quot;, &quot;Select Account&quot;)
If accountName = &quot;&quot; Then
    MsgBox &quot;No account name was entered. Exiting.&quot;, vbExclamation
    Exit Sub
End If

' Prompt the user for the folder name (default is Inbox)
folderName = InputBox(&quot;Enter the folder name you want to scan (e.g., Inbox, Sent Items):&quot;, &quot;Select Folder&quot;, &quot;Inbox&quot;)
If folderName = &quot;&quot; Then
    MsgBox &quot;No folder name was entered. Exiting.&quot;, vbExclamation
    Exit Sub
End If

' Get the MAPI namespace
Set olNs = Application.GetNamespace(&quot;MAPI&quot;)

' Attempt to get the account folder by its name
On Error Resume Next
Set targetAccountFolder = olNs.Folders(accountName)
On Error GoTo 0
If targetAccountFolder Is Nothing Then
    MsgBox &quot;Account '&quot; &amp; accountName &amp; &quot;' not found.&quot;, vbExclamation
    Exit Sub
End If

' Attempt to get the specified folder under the chosen account
On Error Resume Next
Set targetFolder = targetAccountFolder.Folders(folderName)
On Error GoTo 0
If targetFolder Is Nothing Then
    MsgBox &quot;Folder '&quot; &amp; folderName &amp; &quot;' not found under account '&quot; &amp; accountName &amp; &quot;'.&quot;, vbExclamation
    Exit Sub
End If

' Confirm total number of items
totalItems = targetFolder.Items.Count
MsgBox &quot;Total items in folder: &quot; &amp; totalItems, vbInformation

' Create an instance of Excel
Dim xlApp As Object, xlWB As Object, xlSheet As Object
On Error Resume Next
Set xlApp = CreateObject(&quot;Excel.Application&quot;)
If xlApp Is Nothing Then
    MsgBox &quot;Excel is not available.&quot;, vbExclamation
    Exit Sub
End If
On Error GoTo 0
xlApp.Visible = True
Set xlWB = xlApp.Workbooks.Add
Set xlSheet = xlWB.Worksheets(1)

' Write headers to the Excel worksheet
With xlSheet
    .Cells(1, 1).Value = &quot;Sender Email Address&quot;
    .Cells(1, 2).Value = &quot;Email Date&quot;
    .Cells(1, 3).Value = &quot;Email Size (Bytes)&quot;
End With

rowIndex = 2
' Loop backwards through all items in the target folder for reliability
For i = totalItems To 1 Step -1
    Set itm = targetFolder.Items(i)
    ' Process only mail items
    If itm.Class = olMail Then
        Set mail = itm
        ' Get the sender email address, email date, and size
        senderAddress = mail.SenderEmailAddress
        emailDate = mail.ReceivedTime
        emailSize = mail.Size

        ' Write the details to Excel
        With xlSheet
            .Cells(rowIndex, 1).Value = senderAddress
            .Cells(rowIndex, 2).Value = emailDate
            .Cells(rowIndex, 3).Value = emailSize
        End With
        rowIndex = rowIndex + 1
    End If
Next i

' Auto-fit columns for better readability
xlSheet.Columns(&quot;A:C&quot;).EntireColumn.AutoFit

MsgBox &quot;Export complete! Check the new Excel workbook.&quot;, vbInformation

End Sub

I hope this helps.

Fouad
  • 1