** @fcalderan's answer solves the issue and all credit reserved. ** 
This obviously breaks the square shape, but if you would be using any text a small adjustment will work in your favor. You could rather use the ::after pseudo element to not push down or split potential content. Changing to display: block also removes the necessity of vertical-aling: top as far as I know. 
To further preserve the aspect ratio when using text, I'd make the text position: absolute.
See the snippet below when using ::before vs. ::after to illustrate my point. 
.container,
.container2 {
  display: grid;
  grid-template-columns: 1fr 1fr 1fr 1fr;
  grid-gap: 5px;
}
.container div {
  background-color: red;
}
.container div::before {
  content: "";
  padding-bottom: 100%;
  display: inline-block;
  vertical-align: top;
}
.container2 div::after {
  content: "";
  padding-bottom: 100%;
  display: block;
}
.container2 .text {
  position: absolute;  
}
.container2 div {
  background-color: green;
  position: relative;
  overflow: hidden;
}
<div class="container">
  <div>
    <div class="text">Here is some text.</div>
  </div>
  <div>
    <div class="text">Here is some more text.</div>
  </div>
  <div>
    <div class="text">Here is some longer text that will break how this looks.</div>
  </div>
</div>
  
  <hr>
  <div class="container2">
  <div>
    <div class="text">Here is some text.</div>
  </div>
  <div>
    <div class="text">Here is some more text.</div>
  </div>
  <div>
    <div class="text">Here is some longer text that will break how this looks.</div>
  </div>
</div>