Let's have a couple of examples of how foreach gets compiled.
First, on a regular IEnumerator-returning GetEnumerator:
public class A : IEnumerable
{
    public IEnumerator GetEnumerator()
    {
        throw new NotImplementedException();
    }
}
foreach(object o in new A())
{
}
  .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
           class [mscorlib]System.IDisposable V_1)
  IL_0000:  newobj     instance void Testing.Program/A::.ctor()
  IL_0005:  call       instance class [mscorlib]System.Collections.IEnumerator Testing.Program/A::GetEnumerator()
  IL_000a:  stloc.0
  .try
  {
    IL_000b:  br.s       IL_0014
    IL_000d:  ldloc.0
    IL_000e:  callvirt   instance object [mscorlib]System.Collections.IEnumerator::get_Current()
    IL_0013:  pop
    IL_0014:  ldloc.0
    IL_0015:  callvirt   instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
    IL_001a:  brtrue.s   IL_000d
    IL_001c:  leave.s    IL_002f
  }  // end .try
  finally
  {
    IL_001e:  ldloc.0
    IL_001f:  isinst     [mscorlib]System.IDisposable
    IL_0024:  stloc.1
    IL_0025:  ldloc.1
    IL_0026:  brfalse.s  IL_002e
    IL_0028:  ldloc.1
    IL_0029:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
    IL_002e:  endfinally
  }  // end handler
Nothing wondrous here, just calling the methods on IEnumerator. Notice that it also implements IDisposable, hence it uses its pattern.
Next, let's have a value type-returning GetEnumerator:
public class B
{
    public BE GetEnumerator()
    {
        return new BE();
    }
    public struct BE
    {
        public object Current {
            get {
                throw new NotImplementedException();
            }
        }
        public bool MoveNext()
        {
            throw new NotImplementedException();
        }
        public void Reset()
        {
            throw new NotImplementedException();
        }
    }
}
  .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
           class [mscorlib]System.IDisposable V_1,
           valuetype Testing.Program/B/BE V_2,
           object[] V_3,
           int32 V_4)
  IL_002f:  newobj     instance void Testing.Program/B::.ctor()
  IL_0034:  call       instance valuetype Testing.Program/B/BE Testing.Program/B::GetEnumerator()
  IL_0039:  stloc.2
  IL_003a:  br.s       IL_0044
  IL_003c:  ldloca.s   V_2
  IL_003e:  call       instance object Testing.Program/B/BE::get_Current()
  IL_0043:  pop
  IL_0044:  ldloca.s   V_2
  IL_0046:  call       instance bool Testing.Program/B/BE::MoveNext()
  IL_004b:  brtrue.s   IL_003c
Notice that nothing here has to implement IEnumerable/IEnumerator interfaces. This method is sometimes called duck-typing. This implementation internally stores the enumerator in a variable and then calls the methods on its address (ldloca). One value type copying happens here, when the enumerator is returned from GetEnumerator.
A third example is almost a completely different thing, foreach on an array:
foreach(object o in new object[0])
{
}
  .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
           class [mscorlib]System.IDisposable V_1,
           valuetype Testing.Program/B/BE V_2,
           object[] V_3,
           int32 V_4)  IL_004d:  ldc.i4.0
  IL_004e:  newarr     [mscorlib]System.Object
  IL_0053:  stloc.3
  IL_0054:  ldc.i4.0
  IL_0055:  stloc.s    V_4
  IL_0057:  br.s       IL_0064
  IL_0059:  ldloc.3
  IL_005a:  ldloc.s    V_4
  IL_005c:  ldelem.ref
  IL_005d:  pop
  IL_005e:  ldloc.s    V_4
  IL_0060:  ldc.i4.1
  IL_0061:  add
  IL_0062:  stloc.s    V_4
  IL_0064:  ldloc.s    V_4
  IL_0066:  ldloc.3
  IL_0067:  ldlen
  IL_0068:  conv.i4
  IL_0069:  blt.s      IL_0059
This uses no GetEnumerator, and simply traverses the array in the old-fashioned index way (higher performance).
You cannot use this exact optimization in CIL, because CIL has no duck-typing; you have to write all the signatures and method calls yourself.
However, if you need to have this optimization for any type, and you can modify the types you want to use, you can use generic interfaces in a code similar to this:
public class B : IStructEnumerable<object, BE>
{
    public BE GetEnumerator()
    {
        return new BE();
    }
}
public struct BE : IStructEnumerator<object>
{
    public object Current {
        get {
            throw new NotImplementedException();
        }
    }
    public bool MoveNext()
    {
        throw new NotImplementedException();
    }
    public void Reset()
    {
        throw new NotImplementedException();
    }
}
public interface IStructEnumerable<TItem, TEnumerator> where TEnumerator : struct, IStructEnumerator<TItem>
{
    TEnumerator GetEnumerator();
}
public interface IStructEnumerator<TItem>
{
    TItem Current {get;}
    bool MoveNext();
    void Reset();
}
public static void TestEnumerator<TEnumerable, TEnumerator>(TEnumerable b) where TEnumerable : IStructEnumerable<object, TEnumerator> where TEnumerator : struct, IStructEnumerator<object>
{
    foreach(object obj in b)
    {
    }
}