I have a list say: list=['199.72.81.55', 'burger.letters.com']. All I want now, is to get matching values from my dataframe. For example: when I search burger.letters.com my dataframe should return host, timestamps for burger.letters.com. I tried doing this way: df.ix[host] for host in list However, since I have 0.4 billion rows just performing forloop over df.ix[host] it takes more than 30min.
And it takes forever when I run below code.
Below is what my dataframe looks like:
    host                     timestamp
0    199.72.81.55             01/Jul/1995:00:00:01
2    199.72.81.55             01/Jul/1995:00:00:09
3    burger.letters.com     01/Jul/1995:00:00:11
4    199.72.81.55             01/Jul/1995:00:00:12
5    199.72.81.55             01/Jul/1995:00:00:13
6    199.72.81.55             01/Jul/1995:00:00:14
8    burger.letters.com     01/Jul/1995:00:00:15
9    199.72.81.55             01/Jul/1995:00:00:15
I want my desired output like this:
for host in hostlist:
    df.ix[host]
So this operation returns below: but too heavy as I have 0.4 billion rows. And want to optimize this.
df.ix['burger.letters.com']
       host                  timestamp
    3    burger.letters.com     01/Jul/1995:00:00:11
    8    burger.letters.com     01/Jul/1995:00:00:15
df.ix['199.72.81.55']
       host                  timestamp
    0    199.72.81.55             01/Jul/1995:00:00:01
    2    199.72.81.55             01/Jul/1995:00:00:09
    4    199.72.81.55             01/Jul/1995:00:00:12
    5    199.72.81.55             01/Jul/1995:00:00:13
    6    199.72.81.55             01/Jul/1995:00:00:14
    9    199.72.81.55             01/Jul/1995:00:00:15
Below is my code: //takes more than 30minutes
list(map(block, failedIP_list))
    def block(host):
        temp_df = failedIP_df.ix[host]
        if len(temp_df) > 3:
            time_values = temp_df.set_index(keys='index')['timestamp']
            if (return_seconds(time_values[2:3].values[0]) - return_seconds(time_values[0:1].values[0]))<=20:
                blocked_host.append(time_values[3:].index.tolist())
I would really appreciate if anyone can help.
 
    