If you need the speed, consider embedding a "type(-identifying) number" in the objects, and using a switch statement to select the type-specific code.  This can avoid function call overhead completely - just doing a local jump.  You won't get faster than that.  A cost (in terms of maintainability, recompilation dependencies etc) is in forcing localisation (in the switch) of the type-specific functionality.
IMPLEMENTATION
#include <iostream>
#include <vector>
// virtual dispatch model...
struct Base
{
    virtual int f() const { return 1; }
};
struct Derived : Base
{
    virtual int f() const { return 2; }
};
// alternative: member variable encodes runtime type...
struct Type
{
    Type(int type) : type_(type) { }
    int type_;
};
struct A : Type
{
    A() : Type(1) { }
    int f() const { return 1; }
};
struct B : Type
{
    B() : Type(2) { }
    int f() const { return 2; }
};
struct Timer
{
    Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
    struct timespec from;
    double elapsed() const
    {
        struct timespec to;
        clock_gettime(CLOCK_MONOTONIC, &to);
        return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
    }
};
int main(int argc)
{
  for (int j = 0; j < 3; ++j)
  {
    typedef std::vector<Base*> V;
    V v;
    for (int i = 0; i < 1000; ++i)
        v.push_back(i % 2 ? new Base : (Base*)new Derived);
    int total = 0;
    Timer tv;
    for (int i = 0; i < 100000; ++i)
        for (V::const_iterator i = v.begin(); i != v.end(); ++i)
            total += (*i)->f();
    double tve = tv.elapsed();
    std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';
    // ----------------------------
    typedef std::vector<Type*> W;
    W w;
    for (int i = 0; i < 1000; ++i)
        w.push_back(i % 2 ? (Type*)new A : (Type*)new B);
    total = 0;
    Timer tw;
    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
        {
            if ((*i)->type_ == 1)
                total += ((A*)(*i))->f();
            else
                total += ((B*)(*i))->f();
        }
    double twe = tw.elapsed();
    std::cout << "switched: " << total << ' ' << twe << '\n';
    // ----------------------------
    total = 0;
    Timer tw2;
    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
            total += (*i)->type_;
    double tw2e = tw2.elapsed();
    std::cout << "overheads: " << total << ' ' << tw2e << '\n';
  }
}
PERFORMANCE RESULTS
On my Linux system:
~/dev  g++ -O2 -o vdt vdt.cc -lrt
~/dev  ./vdt                     
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726
This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast.  Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.
COMMENTS RE VIRTUAL DISPATCH
It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters).  Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost.  People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives.  The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).